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This thesis is an investigation of problems in information 
retrieval and the system design of a computer system well matched 
for information retrieval. 

Performance of retrieval systems, nar^oly, recall and precision 
can be iraproved by using multi-*-attributes for inf<^mation represen- 
tation and user feedback aJrout relevant and irrelevant retrieval. 

1)01 algorithm to select the set. cf .:;:::tributc.3 for a data base hcis 
been proposed in this thesis. The alc;orit.'.iiM ic L-ased on the in- 
formation obtained, from statistic:: 1 CC' -occurencG of attributes in 
the delta base and v.iien .applied tr a .^lample diita base gives the 
keywords and citetiens as the two nost important attributes for 
library information system. The improved performance obtained in 
feedback retrieval process as compered to non' fe^adback processes 
has been studied and formalized. 
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Considerincj these aspects of retrieve 1 problems a special 
purpose coiaputer system is proposed. The motivation for this 
design is tlix’ee fold': 

In inf ermatievi retrieval,, matching of documents 
with th.e query need not be sequential. This kind 
of “inherent parallelism/' in retrieval problems 
should be exploited. 

There is a hierarchy of operations involved in 
inforKi.ation retrieval like query analysis, retrieval 
operation and output presentation, and each of these 
has its o\n\ special requirements . It is desirable 
to study the feasibility of design of a coraputer 
system well matched to each of these levels. 

There is an increasing need for computing pewer at 
a lower cost due to the rate of information grovrth 
and need for improved performance. Mso the multi 
attribute system and feedback retrieval ccjuld. improve 
the performance at the cost cf computing time, 

A computer system with a hierarchy of processors is proposed. 
This system has a preprocessor operating in a time sharing mode 
fer query analysis and managing the second level processors. 

The second level of tiie system h^'.s one c>r morv,; )orocussors which 
perform the retrieval f'unction o.nd select pointers to the 
relevant documents. These second level prc'Cessors have been 
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economically designed by syfecializing their function. The 
output processor in its simplest form, can be a vsoft copy 
display. 

The main difference botv^een this <''rchitGcture and 
other array processing machines such as SOLOMON and ILLIAC IV 
is that inter processor control and comraunication is only 
through the central controller. This modularises the system 
and endows it v/ith flexibility to moot growing requirements. 
Performance degradation which might occur due to the lack of 
direct communication bo-ti-jeen. the parallel processors can be 
reduced by a suitable file orgeanization technique which 
minimizes the need for such comr-i uni cation. The proposed 
architecture is described in Pn,S and ISP notations v^hich are 
developed by Bell and Nev;ell to describo computer systems. 

A system level simulation of this conouter architecture 
has been carried cut with a prorran irritton in POrTPAl-T. System 
characteristics like arrival rate of GUGtr;o,err» and computational 
time required per customer are inputs to this simulator. The 
number of parallel jprocessors ^ the nuiubor of channels for 
file access^ and the number of output channels are the system 
parameters. Average waiting time of the ciistomers, response 
time and utilization of the resources constitute the outputs of 
this siraulator. The results of simulation are used in developing 
a methodology for the system d.-ci'-n. The use of this simulator 
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general system design of this kind is discussed. The use of 
an analytical model V7hich supnloniGnts this simulation model 
has been considered. 

By making ttie second level garallvol processors highly 
special purpose a vired in logic for comp.letely hardware 
oriented search is proposed, ?. coded representation of the 
documents and its use in hardware orivcntod, s»-jt?.rch ar-'; discussed. 
In tiiis method the cost of the parallcjl processors is no more 
than the cost of a fev? gates and registers. 

The main conclusion of this thesis is •” 

It is feasible to design a hierarchical special purpose 
parallel processing computer system xtoII matched for information 
retrieval problems. Further it may be made modular to meet 
the growing needs of an information retrieval environment. 
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CHAPTER I 


INTRODUCTION 

1 , 1 Need for Information Retrieva l 

Information retrieval, is the selective recall of stored 
knowledge. The rapid growth of inf onaation in recorded, form 
has necessitated the use of automatic computers as a tool for 
information retrieval. Technical publication is doubling 
every three years. Abstract journals have been suggested as 
a means of uncovering the contents of a variety of technical 
publication and presenting them to the scientific community. 
Today there are more than ten thousand technical journals and 
over three hundred abstracting journals. It has been sugges~ 
ted that perhaps the next step should be the creation of an 
abstract journal that abstracts abstract journals (110) i 

It has been estimated that 30% of the efforts spent ^ in 
terms of the cost at MIT alone is in redoing things which have 
been done elsewhere due to the nonavailability of adequate 
information service centres (63) , Besides technical informa- 
tion projects areas like medicine ^ government and business 
have already started using computers to meet the increasing 
volxime of information and retrieval (101) . 

Over five hundred hospitals were using automatic moans 
in accounting procedures by 1968. Computers are, recently 
being used to handle medical records of patients. A typical 
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medical information system has more than 1,25,000 examina- 
tion records and can be accessed on line (101) . The MEDLAR 
system for medical literature analysis and retrieval is a 
major achievement in this direction (59) » lu is centered at. 
the National Library of Medicine and planned from the star;t 
to be expanded into a decentralized organizatlono The acti- 
vities of MEDTjARS include indexing , search , maintenance and 
improvement of the ' Libraries % the thesaurus of Indexing 
Terms and preparation of a medical glossary. 

The government systems for information retrieval Include 
motor vehicle, law enforcement, criminal justice, legislative, 
and- urban planning and control. It has been estimated that 
motor vehicle applications v/ould typically have over thirty 
million records. The system will have hundreds of terminals 
for inquiry by courts, law enforcement and insurance agencies 
(73) . A typical police information network is required to 
provide response, in thirty seconds when inquiry is made to a 
file of 2,00,000 warrants of arrests (101). In all, there are 
more than 4,50,000 papers in standard TUaorican technical 
journals and 1,00,000 informal Government reports published 
every year and -this number is steadily increasing (111) , 

The literature on information systems is almost completely 
dominated by business applications of large real time informa- 
tion systems. Maintenance and analysis of policy records 
in insurance area, on-line ticket reservation in air traffic 
and order-inventory systems are some examples in. business 
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application where computers are already in use to handle 
large volumes of data (101, 68), 

The above discussions motivate launching a large-scale 
program to automatic information retrieval, Bar™Hillel (5) 
challenges the argument about technical information explosion 
and questions whether information retrieval is in fact approa- 
ching a crisis. From his point of view specialization has 
been the defensive mechanism that ha.s evolved to combat the 
growth of technical publications. However, he himself is 
convinced that the growth of information in the field of law 
and medicine need special attention. Often the 'information' 
explosion is stressed more as the major reason for automatic 
information retrieval, but the need for quick response and 
man-machine interaction in decision making are not given due 
consideration (95) . 

1.2 Information Retrieval System 

An information retrieval system has two major sections 
namely, infomation storage and information rotrioval. The 
information store is the raw material for retrieval. Infor- 
mation and queries both are normally available in the form 
of texts. For economical and effective searching, they are 
represented in an automated system through a set of “attributes’ 
or "keys". Through long experience librarians have fomd that 
direct extraction of keys from the original text does not 
necessarily lead to effective retrieval. (By effective 


4 


retrieval we mean that almost all relevant documents to a 
query are retrieved and no irrelevant document is retrieved) 
Hence it is necessary to modify the attributes of the text 
to standardise them in some sense (109) . TJhen a docxament is 
represented by a set of such standardised attributes, it is 
said to be indexed. 

The indexed documents or texts in a conventional library 
are classified into ono or more groups according to a classi- 
fication rule. The classification rule vcould normally group 
all related documents into one group. T'iJhen a document falls 
into more than one group, cross references are given to the 
other relevant groups to which it belongs (89), The main 
aim of classification is to minimize the effort needed in 
retrieval. In automatic systems also, the indexed documents are 
organized into a number of groups and stored in a storage 
medium, accessible to the machine. This classification and 
storage in machine accessible form are together known as "file 
organization”. Thus, the raw material for information retrie- 
val process is a set of indexed and organised files. 

Then, tl:.e information retrieval process can bo represented 
as shown in Figure 1.1. The query records from users for 
information are first standardized by using the same procedures 
as in document text standardization. The transformed qUery 
is used to "match" with the information items in the store 
in order to select the set of relevant documents , Often the 


5 



Figure 1.1 Information storage and retrieval 
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transformed document texts as in the storage file may not be 
in a readable form for the requester. In such cases the 
retrieval system selects a set of pointers uhich refer to their 
corresponding readable texts ^ which might be stored in a 
separate file. Either a soft copy or hard copy of the selected 
doc\iraent texts are presented to the user depending on the need 
and available facility. 

In summary, the main components of an information retrieval 
system are; 

(i) an indexed and organized file. 

(ii) a retrieval method for matching -the transformed 
query with the documents, 

(iii) a query language in which the user states his 
needs and commmicates with the system. 

(iv) a set of user terminals with facilities for 

entering the query and receiving the response. 

In a complex information system design these factors cannot be 

considered independently. For instance, file organization and 

retrieval method have a number of common factors (109) , Simi~ 

larly the kind of user terminals, user groups and document 

types in storage are to be considered jointly in developing a 

query language. 

1.3 On line retrieval systems 

Batch processing of retrieval requests is simple, but 
unsatisfactory for a number of reasons (95) , Further, the 
quick response required (in the order of seconds) in military. 
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medical and legal information systems has led to the develop- 
ment of "on-line information systems". The concept of time 
sharing and interactive processing with third generation 
computers have made it feasible to design the on-line systems 
like the one in project r4AC (54) . SI®.RT system (94) is 
another example. Licklidor (62) in his Libraries of the future 
envisions a situation in which an user sitting at a desk like 
inquiry station, connected to an on-line computer system with 
a large memory store, can call for information, receive it on 
a cathode ray tube display scope, modify it, and save it for 
future reference. 

The on-line user feedback could be effectively used in 
improving the performance of retrieval systems. For example, 
displaying the selected part of t]-. ;::nurus may be useful in 
query formulation. Feedback about relevant and irrelevant 
retrieval would be useful in the retrieval process. The results 
obtained from various experiments with SMART system have shown 
that the performance of retrieval systems could be improved over 
30% on an average, with user feedback (18) . 

There are three distinct types of time-sharing installa- 
tions in practice. The first type (and the one meant more 
often than not when the term time sharing is used) is the 
"general purpose, time sharing system" (99) . A general purpose 
time sharing system attempts to provide each user with the full 
rarige of capabilities of a general purpose computer. A second 
type of system sometimes goes by the name "Dedicated sys^tem". 
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Here the user is constrained to use but a single language (29) . 

The third class of system is the one in which the language of 
the user is further restricted and specialized to deal with a 
restricted set of problems. The SAGE air defence system and 
the SABRE air line reservation system are some examples of this 
kind (80) . 

The need for sharing a comir.on data base like technical 
literature, medical data, legal information or business data 
among the wider group is constantly increasing. Nationwide 
information networks and international information transfer have 
been projected (79) . This thesis is the study of a special 
purpose computer system for information retrieval which will be 
useful in the various application areas mentioned above. 

1.4 Motivation for a Special Purpose Computer System 

The early development of computers and the resulting appli- 
cations have been in the numerical computation of functions. 
Developments in procedure oriented languages (46) and operating 
systems have enhanced the number of users of this facility. 

Hardware developments in areas like semiconductor technology for 
logic and core technology for memories have made the design of 
large computer systems feasible for this purpose. Hardware 
developments have contributed not only to the improvement in 
reliability of operation of such a large system, but also 

towards reducing their cost to acceptable limits (104) . As • ; 

j 

the nvunber of computer installations increased, they begem 

to be used for solving problems in many areas other than numerical i 
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computation. Applications of computers in non-numoric 
problems (which basically do not involve numeric calculations) 
like picture processing information retrieval and pattern 
recognition are some examples (36). As a result special 
program packages and problem oriented languages have been 
developed for this purpose (97) „ All these attempts have 
been in developing software means to use the rrachinsp which 
is designed for nianeric computation in solving non-numcric 
problems. Attempts in designing a computer system for non- 
numeric applications which would be structurally different 
from a numeric machine were rare. The main reasons for this 
could be, the imbalance between the cost of such a system and 
its demand which existed in the last decade. Further, the 
cost of software was a small fraction of the hardware cost 
of computer systems. 

The advent of integrated circuits and their application 
in computer industry have changed the imbalance between hard- 
ware and software costs (35) . The hardware cost is showing 
a decreasing trend whereas the cost of software is constant 
or even increasing. The hardware cost of C.P.U. has been 
estimated to be three percent of the total cost of recent 
systems (4). Semiconductor memories costing 1,5 cents per 
bit have been projected with developments in LSI (98) . As 
a result of these developments, computer systems with more 
than one processor and memory (unlike the conventional 
Von Neumann's xani^'rrecssor computers) have been conceived. 
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A parallel processing system with as much as 256 processors, 
each with 4K memory has been designed at the University of 
Illinois for solving problems in partial differential equa- 
tions and weather forecasting (7) „ 

Having been motivated by the feasibility of sych special 
purpose system design, this thesis considers a computer system 
well matched for information retrieval applications. This 
study is further supported by the increasing applications of 
computers for retrieval of information not only in library 
information systems but also in areas liko , military informa- 
tion system, medical. Government, business and legal informa- 
tion systems. 

The special purpose computer organization discussed in 
this thesis considers the following special nature of informa- 
tion retrieval problems . The problem of information retrieval 
has an "hierarchy of functions" to be performed in a sequence, 
namely, query pre-processing, selection of relevant documents 
to the query and display of the selected documonts to the 
requesters (See Section 2,4). The architecture considered in 
this thesis has three levels of processors well matched at 
each level for the function being performed. Secondly, infor- 
mation retrieval problems have "inherent parallelism" (see 
Section 2.4) which can be exploited by providing more than 
one processor to operate in parallel. Each of these parallel 
processors will be restricted in their operations to handle 
particular data types. This would reduce the complexity and 
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the resulting cost of such processors. 

The proposed architecture with a set of processors 
operating in parallel has been simulated on IBM 7044. The 
use of this simulator in designing a coiapucer system to meet 
the given requirements of an information system like the 
arrival rate of customers and desired response time from the 
system^ is presented. This simulation, shows that by increas- 
ing the number of specialized parallel processors it is possible 
to get a response time of the order of a few minutes as required 
in on-line information systems (73) . Such computer systems will 
be useful in feedback information retrieval systems where the 
on-line interaction with the requester is used in the retrieval 
of desired information. 

By restricting to a particular indexing and coding scheme , 
it has been shown in this thesis, that the cost of a typical 
parallel processor of the kind mentioned earlier is essentially 
governed by the cost of a few gates and registers. 

As the retrieval operation in this architecture is done 
by a separate set of processors operating in parallel , the 
system is modular. T/71ien the work load increases (in terms of 
number of matching required) it is possible to maintain the 
same grade of service (in torms of response time) by suitably 
increasing the namber of parallel operating elements in the 
system. This increase in work load may be an outcome of 
increasing volume of information to be handled or the use of 
special techniques like multiple attributes for information 
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representation in order to improve the performance of the 
retrieval system. 

1. 5 - Outline of tlic Th o,:.'.s 

In Chapter II a brief review of the literature is presen- ■ 
ted. As this thesis is a discussion of a special purpose 
computer system for information retrieval, tho review is divided 
into two sections. The literature on information retrieval 
system is vast. A brief discussion of the various problems 
in information retrieval and the results in each area have 
been presented. The second section is an over view of the 
parallel processing systems proposed in the literature for 
various applications. A proposal for a special purpose computer 
system for information retrieval with due considerations to 
the parallelism in the problem and hierarchy of operations 
involved is given. 

Chapter III is a summary of some results obtained in 
multi -attributes in retrieval process and user feedback as means 
of improving the performance. The contents of this chapter 
when considered with the growing voluim.e of recorded information 
have formed the motivation for the special purpose computer 
system considered in this thesis. 

The details of the proposed computer architoctvxe are 
presented in chapter IV. This architecture has a hierarchy 
of three processors in the second level of which there can be 
more than one processor operating in parallel. The notations 
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developed by Bell and Nowell (12) in describing a computer 
system have been used to describe the proposed architecture. 

The advantages of this system and a typical instruction set 
for the second level processor arc given. The need for 
considering the first and third level processors as special 
purpose time sharing systems in reducing the cost of the 
total system has been discussed. 

Chapter V presents the details of a system level simula~ 
tion of the proposed architecture. This simulator has been 
written in FORTRAN IV. The common problems involved in the 
simulation of a multiprocessor system and concurrent events 
are discussed. A brief description of the? i''iJL;,Tf ’'Rand a 
methodology in using the simulator for system design are 
considered. A number of possible modifications for the simula- 
tor to meet the special conditions are indicated. A dis- 
cussion about formulating the system design problem as a 
mathematical programming problem and its shortcomings are 
pointed out. 

In Chapter VI the SUI-KniliroSED coding scheme developed 
by Huskey and Files (32) is used. With this coding scheme 
and a parallel read disk^ it has been shown that the second 
level processors in the above architecture are much simpler. 

In fact, the cost of a typical second level processor, in that 
case is the cost of a few gates and registers. The price, for 
this reduced cost is in terms of the flexibility and performance 
of the retrieval system. 



CHAPTER II 


PREVIEW OP LITERATURE AND STATEIffiNT OF THE 

PROBLEM 


2 . 1 Introduction 

The term "’information retrieval" was first coined by Calvin 
N. Mooers in the year 1960 at a Computer Conference (74) . Since 
then there are a uxomber of activities and the term itself has 
assumed different connotations. Tho main reason for these 
varied connotations is the difficulty in defining the term "in- 
formation" in a general context. Attempts have been made to use 
Shannon's (103) definition of information with little success. 
The semantic value and the form of information are also involved 
here, rather than the mere uncertainty which determined the 
information content in Shannon's theory. As a result, a number 
of interpretations to information retrieval exist in the litera- 
ture and the most commonly used ones are given below. 

An information retrieval system which retrieves complete 
texts of doexomonts or doexxment surrogates like abstracts are 
coitanonly known as "docxijnent rerrioval system " (5 8) . A, system 
that provides only citations of relevant d.ocumcn'i?;? is "reference 
retrieval system". A system retrieving words or nxmbers as 
response to a query is called "data retrieval system". For 
example a manufacturing company may have a data retrieval system 
to display on request its sales records, inventory items and 
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personnel files etc. Such data retrieval systeras combined, 
with other management functions have come to be known as 
"management information system" (1) . A fourth type called 
"fact retrieval systeras" or "question ansvrering systems" 
deal with answers to specific questions (110) „ For example, 
"what is the area of the triangle whose sides are x,y,z units 
long?" is answered by such systems. But their area of dis- 
course or subject coverage is very much limited as compared 
to document or reference retrieval. For instance, a typical 
question answering system covers topics on high school algebra 
(16) whereas a keyvrord system could cover wider areas like 
computer sciences or electrical engineering and so on. However, 
document retrieval and reference retrieval are the two common 
usage of the term "information retrieval". 

In all these information systeras, we have the system con- 
taining the organized infomation on the one side and the users 
with queries on the other side. Thus it is possible to picture 
the activities of information systems in an unified way, as 
shown in Figure 2.1. 

The following are the main aspects in the study of an 
automatic information retrieval system? 

(i) Automatic indexing and abstracting 

(ii) Automatic classification 

(iii) Semantic problems 

(iv) Retrieval methods and file organization 

(v) Dissemination of information 

(vi) Evaluo.tion and feedback 

(vii) Retrieval models, and 
(^^iii) Information networks. 
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2 » 2 Information Retrieval - A Brief Review 

The state of art in the above mentioned sub-areas of an 
information retrieval system are briefly discussed below: 
Automatic Abstracting and Indexing 

Indexing is the process of selecting the attributes of 
the information item„ The porfermanoo of a ratrioval system 
depends on indexing and should cater to the ntiods of the 
majority of the users » 

There are many kinds of indexing (9) in practice. In 
coordinate indexing, each item of inform.'ition is represented 
by a set of keywords. The selection of key words is normally 
controlled by an "authority list" of keywords. In a dynamic 
system this list needs constant modification. In this system, 
only keywords are represented but not the relation between 
them. Often keyword systems with tree and graph structures 
are used to account for the relation between keywords (96) . 

The citation index (37) using the bibliographic coupling 
between documents is another tool for inf' '.rmr.tion retrieval. 
The bibliographic coupling used by Kessler (53) is based on 
the number of common citations and the nurober of co-citations. 
Maron and Kuhns (66) have defined yet another indexing called 
"probabilistic indexing". In this, each index terra is asso- 
ciated with a relevance number in the interval (0,1) , which 
indicates the probability of that index term representing th© 
document. 
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The problem of abstracting is one level higher in the 
hierarchy and representative sentences are used to describe 
the document content, rather than the keywords » Automatic 
abstracting and indexing methods (30) use the relative frequency 
of keywords. Automatic abstracting, using the introduction and 
conclusion paragraphs have been fovind comparable v/ith manual 
abstracting (52) . Index terms gd.ven by the authors in the pub~ 
lications like IEEE and ACM will be a good starting point for 
automatic indexing, though it is not complete by itself. An 
automatic scheme for indexing and abstracting is more consistent 
when compared to the manual methods and avoids undue delay. 
Automatic Classification 

There are two general types of classifications, namely, 
enximerative classification and derived classification (58) . 

The enumerative classification has fixed number of categories 
and the documents are assigned to one or more of these categories. 
It can be a non faceted classification like Dewey Decimal or 
faceted classification like Ranganathan ' s colon classification 
(90), In faceted classification, the depth of classification 
can be to any extent by repeated application of the classifica- 
tion rules and is more suited for automatic means. 

In derived classification, thoi classification groups are 
d«rived automatically from the data base (67) . Statistical 
relation between the documents in the data base are used for 
this purpose (17, 3, 38). 
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Syntax and Semantics Problem 

The two requests regarding '“'transport from Russia to Cuba" 
and "transport from Cuba to Russia" have the same set of key- 
words which causes unwanted retrieval v/hen only one of them is 
required. The relation between keywords in the form of directed 
graphs was first used by the SYNTOL group (27) to avoid such 
problems. This was later modified and called criterion phrase 
by Sal ton (96) in his SISART system. 

By semantic problem, we mean the semantic equivalence of 
descriptors. Construction of the dictionary called thesaurus, 
giving the semantically equivalent descriptors for a given 
descriptor is a challenging problem. User interaction in deter- 
mining the validity of this equivalence is proposed as a part 
of an interactive retrieval system (18) . Another problem which 
occurs due to non-standard word ending, however, is not consi- 
dered to be a semantic problem. For example, computer, computers 
and computers all will have to be treated as equivalent. The 
common word stem obtained after deleting the non standard 
endings like "ed"? "ly" is normally used for this purpose. 
Retrieval Process and File Qroani t i cr^ 

A document is retrieved as a response to a query when the 
relatedness betv;een them exceeds a certain threshold. There 
are a nxamber of relatedness measures used for this purpose (9,50) 
The retrieval process cind file organisation are closely related, 
When all the doctiments of the file (or sub files) are to be 
scanned, a sequential file would be appropriate, A co-ordinate 
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indexing with keyvjord matching on the other hand would need 
a semirandom access file like magnetic disk. The sequential, 
random access and combined file organization have their own 
advantages and disadvantages (22) , The choice of file orga- 
nization is governed by 5 

(i) cost of storage and storage efficiency, 

(ii) cost of updating and its frequency, 

(iii) the retrieval process? and 

(iv) available storage device types liace disk, tape, etc. 

Another kind of information organization and retrieval 
stems from the associative or content addressed memories (86) . 
Processors based on such memories are referred to as "associative 
processors" (60) . As the cost of associative memory is high, 
these processors have them in a limited size and the file is 
processed in chunks. Each chunk can be handled by a content 
addressed memory of appropriate size. 

Information Dissemination 

The user terminal which provides the communication between 
the user and the system is important in an interactive retrieval 
system, in fact, retrieval systens (105) and, f’.o question answering 
systems (91) have such terminals as an integral part of the 
system, CRT display for soft cony output with a keyboard for 
input is commonly used. 

Unlike the ser\'’ice on demand system, H.P. Luhn of IBM 
has suggested a "current awareness system" (64) . This is 
also known as selective dissemination of information (SDI) . 
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In SDI service, the interested users are permanently in the 
system, represented by the "user profile". As new documents 
are added to the system, the interested users are periodically 
notified. Z\n interesting feature of this system is the user 
feedback. The performance of the retrieval system can be 
evaluated and the user profiles modified to suit the needs 
of the users. This is possible because the user profile is 
kept permanently in the system. 

Performance Evaluation 

The performance of an information system can. be considered 
from the point of view of users or system designers. Tho two 
performance measures suggested by Cleverdon (24) are recall 
and precision and continue to be the commonly used measures. 
They are defined belov7 3 

Recall = Kumber of relevant documents retrieved 

Total number of relevant documents in the system 

Precision = Numb e r of relevant documents retrieved 
Total number of documer tsretrieved 

From the above definitions it is obvious that tho recall would 

need the knowledge of the total number of relevant documents 

and precision would need user judgement. Further they depend 

on the exhaustivity or specificity of indexing. Thus a set 

of queries, their recall and precision with different indexing 

schemes would be a measure of the indexing system itself (24) . 
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Various experiments have conclusively shown that there 
exists an inverse relationship between recall and precision 
for a fixed indexing scheme. Lancaster has verified the same 
with the data base obtained from SIEDLAF” '59) aid classifies 
the users into groups who need high recall and acceptably low 
precision and vice versa. 

Attempts have been made to use Shannon ' s formula for 
information transmitted over a noisy channel for the evaluation 
of information retrieval jirocess (70) . 

Interaction betv/een users and the system at the following 
levels have been suggested as means of improving the performance 

(i) In search formulation and instructing the 
user in the use of query language. 

(ii) In determining the synonyms of the keyvrcrds 
used. 

(iii) In examining the partial search result and 
controlling the search process. 

(iv) In controlling the precision-recall trade 
off by making specific or generic queries. 

However, the performance of the retrieval system from the 
management point of view is not given much consideration except 
by Lancaster and Mueller (76) . 

Retrieval models 

Becker and Hayes (9) have studied the problems of infor- 
mation retrieval using Boolean algebra and linear algebra. 

The quantitative studies by Morse (75) of the library system 
uses the notions of probability theory, queuing theory and 
Markov process to model book ordering, circulation and other 
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activities of a library. Assuming the dociments as a collection 
of points in an n-dimensional metric space and using barycentric 
coordinates, Hayes (42) is able to describe the information re- 
trieval process. Sal ton has used a set v;.:.eoretical model for 
information retrieval and studied the inverse relation between 
recall and precision (96) . 

The mathematical models developed in the literature are 
mostly intended to describe existing information retrieval 
systems. Results obtained from them do not seem to have any 
substantial impact on the systems being designed. 

Information Networks 

The inter library loan system is one step towards information 
networks in manual systems. The project INTREX at MIT (79) and th< 
article by Kemney (51) discuss nationwide and international 
information networks. The computer networks (10) project xander- 
taken at CMU as the part of ARP A project will have much relevance 
in this context. 

In summary, the trend in recent experiments and literature 
show that keyword matching and directed graphs are popular 
tools for retrieval. User interaction and feedback in retrieval 
are being considered with the recent time sharing systems. The 
evaluation of retrieval systems still uses recall and precision 
measures and the theoretical models are not adequate enough to 
influence the practical or experimental system designs and 
need deeper study. 
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2 , 3 Parallel Processing Computer Systems -• A reviexv 

Computer systens with special architecture for applications 
in various fields have been considered in the literature. Com- 
puter systems have been developed for airtraffic control (45 ) , 
weather forecasting (102) , solving partial differential equations 
(14) , picture processing (15) and space missions (57) . In this 
section a brief discussion about the parallelism in data proce- 
ssing and a review of computers designed to exploit such parallel! 
are presented. 

Originally the term parallelism was used for parallel trans- 
mission or processing of several bits of a word. Recently the ten 
is used to include any situation where two or more functions are 
performed at the same time (92) , There are two kinds of paralle- 
lisms defined by Lehman (61) s (i) macro parallelism and (ii) micro 
parallelism. When several units of a multi -processing system are 
utilized to process, in parallel, independent sections of a job, 
we exploit the macro parallelism. On the other hand when relative 
independence of individual machine irstri'ctions are exploited 
as in the case of lock ahead machines we call it micro paralle- 
lism. A parallel processing may be considered to be one in which 
macro parallelism is used. By convention the parallel execution 
of I/O is not termed as parallel processing. 

Flynn (34) distinguishes between the following two kinds of 
macro parallelism: 

(i) Single instruction stream - multiple data 
stream (SIMD) . 
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(ii) Multiple instruction stream - multiple data 
stream (MIMD) , 

The terms "instruction stream" and "data stream" mean sequence 
of instructions and data respectively „ 

The SOLOMON (Simultaneous Operation Linked Ordinal Modular 
Network) system has been proposed for SIMD parallelism (106) . In 
this architecture there is a central control unit executing the 
program commands. There are 1024 processing elements (PE) arranged 
in the form of an array. Each PE is connected to four of its 
physically adjacent neighbours. The control signals required to 
execute a command are given to the PE's by the central control 
unit through a network sequencer. Input-output from the P.E. 
network is accomplished by a 32 bit buffer register which has the 
capability to communicate with any specified row or column of 
processors. Each P.E. has two memory frames, each of 2K bits. 
Basically, each P.E. can be in one of the four possible modes 
and mode control is given by the central unit alongwith each 
instruction. If this matches with the P.E. mode state, then 
that command is executed. However, a P.E. , not executing a 
command can provide an operand to one of its foqr neighbours. 

Its application in solving partial differential equation and 
photo reconnaissance in military applications have been 
explored (106) , 

The Illiac IV main structure consists of 256 PE's arranged 
in four reconfigurable SOLOMON type arrays (7) . Each processor 
has 2K, 64 bit/word, 240 nsec thin film memory. It differs 
from SOLOMON in the sense that as much as four different 
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instruction streams can .be processed simultaneously and is 
suited for MIMD parallel processing. Its application in 
matrix operations Fourier analysis and related problems 
have been discussed in (7) , The four possible control 
units of Illiac IV improve the reliability of this system 
as compared to SOLOMON. 

The distributed processor considered by Koczela (57) 
is a general purpose computer system with high reliability and 
is easily expandable or contractable. The organization con- 
sists of a number of identical cells interconnected in a certain 
manner. Each cell consists of a general purpose processor 
section and a small amount of memory (512 words, 16 bits/v7ord) 
on a single LSI wafer. The cells are divided into groups and 
connected by intercell bus, and these groups are interconnected 
by intergroup communication switch. The difference between this 
architecture and SOLOMON or Illiac IV is that cells may be opera- 
ted independently or dependently of the controller cell. Each 
group has a fixed number of cells, of which one will be desig- 
nated as a controller cell. This facilitates the processing of 
a set of tasks or subtasks independently on distinct or the same 
data beses. This kind of parallelism is called "natural para- 
llelism” by Koczela. Each cell has two neighbours and data can be 
shared between them. Similarly the source of instruction can be 
either from its own memory or from the controller cell in the 
group. Further, detection of faults, reconfiguration and its 
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application in space mission have also been discussed in (42) » 

The multi list processor designed by Prywes is yet another 
architecture for information retrieval (04) „ The concepts of 
list processing and associative memory are used to process IPL 
like statements (77) v/ith much greater efficiency. Represen- 
tation of the information in the form of a multi -list for real 
time applications has also been studied by Prywes (05) . 

In the above architecture an associative memory is simula- 
ted on an addressed memory. It has been designed for variable 
length items. A memory synchronizer is placed between processor 
and memory and its functions ares (i) to store and read variable 
length items (ii) to automatically assign memory space and (iii) 
to synchronize the speed of the memory and the processor. The 
synchronizer and the memory together appear as an associative 
memory to the programmer. 

In the above case, the particular formulation of infojnnation 
retrieval as a list processing problem is exploited through 
special hardware for matching variable length records. However, 
the parallelism is not exploited in this design. Though the 
memory synchronizer itself is a processor its function is to 
keep track of available free space and memory is addressed 
always through this synchronizer. However, it is claimed 
that the instructions for variable length data operations and 
the. pseudo associative memory make programming much simpler. 
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The associative parallel processor (APP) proposed by 
Bird at Librascop.e group (15) is an architecture designed for 
picture processing. Associative nvemory is an integral part 
of this machine. The parallelism in pr-iblsms will have to be 
made suitable to this architecture by "sequential-state- 
infomation'= (31) , When many pairs of data are to be added, 
in this system, they are added in parallel by word and serial 
by bit. Each cell in the associative memory has the required 
matching logic and other indicators. In addition to this 
associative memory, the computer has a random access primary 
memory and a processor to control the overall cperation. The 
information stored in data registers are processed in the 
associative processor and results are gated out. Its applica- 
tion in picture processing has been considered in the reference 
given earlier. 

In summary, the special purpose computer architectures for 
parallel processing have been proposed and designed in the 
literature. Array processing computer systems like Illiac IV 
with 256 P.E.'s each P,E. with 4K (64 b/w) memory and 10^ gates 
are feasible v/ith the current technology. However no parallel 
processing configuration for a computer system has yet been 
proposed for information retrieval applications. 

2* ^ Statement of the Problem 

The process of Information Retrieval is achieved in three 


stages. These are; 
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(i) Pre-processing Stage; The query from the user is 
analysedo Synonym tables and file directory are 
consulted to determine the set of physical files 
to be secirched for the given query. 

(ii) Retrieval Stage; The selected files in the’ first 
stage are processed to find the set of pointers 
to the relevant dociioents for the query being 
tjrocessed. 

(iii) Output Stages The selected pointers in the 

second stage are used to pick the excerpts of the 
documents which can be any of abstract, the title, 
bibliographic details, the complete text. The 
selected excerpts are presented to the requester 
in appropriate form. 

The first and the third stages involve interaction with 
slow user terminals. However, the retrieval stage receives its 
input, from the first stage and after processing them passes the 
selected pointers to the output processor. t'7e observe that the 
first and last stages are I/O bound rather than processor bound. 
The case with the second level would very much depend on the 
I/O speed and processing rate. As the file used in this stage 
consists of indexed and coded documents, it is possible to 
store them on high speed back up memories like disks or drums. 

In a computer system designed for information retrieval appli- 
cations it is worth exploiting this hierarchy of functions by 
considering well matched systems at these levels. 
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Consider an information storage with M items or documents 
and each document having N keys» Suppose, a request which 
specifies n attributes for the desired document is given » A 
document is said to be retrieved for this query if the corres- 
ponding attributes of the query and the dociment "match". Then 
it is not necessary for the retrieving machine to look at each 
document in a sequential order for match. In other words, if 
there are M such machines, they can examine the query and M 
documents simultaneously. Since the numl^er of documents, in 
actual situation is large it is not economical to have that many 
machines. However, the documents can be classified into groups 
and these independent groups can be processed simultaneously. 
What we are looking for is a computer system with parallel pro- 
cessing capacity at the second level to exploit this "natural 
parallelism" (57) . 

As a different situation in retrieval operation c<~iasider 
the following type of query; "Does a patent on parallel pro- 
cessing computer for information retrieval exist?" Suppose 
there are M machines processing M different files to find the 
response to this query. If one of them comes up with an affir- 
mative answer, the entire processing for this query can be 
suspended thereafter. Thus one processor in the second level 
might control the other processors working for the same query. 

The SOLOMON architrecture has parallel processing elements. 
The spatial adjacencies between the array of processors and 
the input output schemes are unsuited for the above problem. 
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In processing variable length records of information those 
processors v;hich finish their job earlier will* have to vrait 
for others. Further each P.E., in this architecture is a 
general purpose computing element designed for operation on 
fixed length data. This makes it worse suited, for informa- 
tion retrieval problems with vari.ablc length information items. 
Illiac IV being in the same lines as SCLOI-ION has the same 
disadvantages. These architectures have been suggested for 
problems in partial differential equations and weather fore- 
casting. The distributed processor suggested by Koczela .has the 
independent processing facility. But the grouping of cells and 
the lack of hierarchical processing make them less suited for 
retrieval operations. 

The associative processors discussed in Section 2.3 have 
the same problem as the distributed processors. Further when 
the processing need matching of structured information the 
associative processor would be more complicated and also 
inflexible due to fixed logic. The multi -list processor by 
Prywes (84) is suitable for processing structured information 
but does not take into account the natural parallelism dis- 
cussed earlier. 

It can be shown that multiple attributes and user 
feedback in retrieval improve the performance of retrieval 
systems. The growing rate of information, need for inter- 
active processing and multiple attributes all warrant a high 
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computational power at a lower cost for information retrieval 
problems. The special characteristics of these problems can 
be exploited in a special purpose computer system. 

A computer system for information retrieval should consider 
the following points; 

(i) The natural parallelism in information 
retrieval problems , 

(ii) The hierarchical structure of the problem. 

(iii) Ability to schedule the availal^lc resources 
to the load on demand at any time, in order 
to provide better performcince. 

(iv) A modular structure which can be expanded 

or co>iihtjg!r acted to meet varying requirements. 

(v) Reliability of operation of the system with 
the inherent redundancy in the system design. 

However such an architecture has also its disadvantages. 

Two obvious disadvantages are the initial cost and the special 
purpose design. The trend in LSI (35) will reduce the cost 
in the long run. The growing trend tc; have mechanized informa- 
tion systems and information net;.;orks both at the national and 
international level v/ould provide enough scope for such a 
specialized architecture. 

In this tliesis, a computer system well matched for infor- 
mation retrieval problems with due consideration given to the 
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above points has been proposed, h system level simulation 
gives a methodology for sys'tem design of such an architec- 
ture, It is also possible to reduce the cost of such a system 
by making it highly special purpose as discuGsed in Chapter 


VI. 



CHAPTER III 


MULTI ATTRIBUTES liND USER FEEDBACK IN 
INFORMATION RETRIEVAL 

Improvement in Retrieval Systems 
An item of information indexed and stored in a file for 
retrieval may have more than one "attribute" « These attributes 
are also known as 'keys’. For instance, a document in a library 
information system may have the title, the author (s) , the source 
of publication and the time of publication as its attributes. 

It has been shown by many experiments (93, 53, 69) that biblio- 
graphic information, citation and the journal of pioblication 
are useful attributes in retrieval. Such multi attribute 
representation of information can lead to higher precision 
when the requester knows his needs "precisely". 

The need for precise statement of the query in terms of the 
language used by the system in describing the documents , is an 
important requirement. This precision is very hard to define 
commonly for all users. For example, an active research worker 
may be looking for a particular document whereas a beginner 
in that field may be interested in documents of general interest. 
This is an important consideration as the performance measures, 
namely, the recall and precision involve subjective decisions. 
Thus it has bde'ri' proposed by Salton (96) and others to 
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involve' the requester as part of the process of retrieval. 

Such online interactive retrieval systems seem to have 
advantages when indexing is not perfect. In this chapter 
these factors are discussed in detail and an algorithm is 
suggested for picking the attributes of the da'ba base for an 
information system. The main objective is to emphasize the 
need for high computing power to be availcible to its users in 
an infomation system. The reasons for this need are two 
fold, namely, the multi attribute system of information re- 
presentation and on line retrieval facility for interactive 
search. 

3.2 Multi Attributes for Information Representation 

In keyword indexing system each document is characterised 
by a set of keywords. The number of keywords depends upon the 
depth of indexing. A request in such a system also consists 
of one or more keywords. In many retrieval systems the user 
may assign weights to these keywords and give a threshold for 
selection. In simpler systems only conjunction or disjunction 
of these keywords are pemitted. Conjunction of keywords or 
C 0 “ 0 ccurance improves precision. For instance, the request 
for documents on computer system design is more precise when 
the keywords "computer®' "'systems" and "design" are required 
to co-occur , in a document for retrieval. On the other hand 
diSijunption of these keywords might retrieve more unwanted 
documents. However, if a document, dj deals with some aspects 
of computer design and if the three keywords mentioned above 
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are not co-occuring, the conjunction method would miss it 
whereas the use of disjunction would retrieve it. This 
difficulty arises mainly because the concerned document is 
not viewed by the indexer as it would be done by this 
particular requester. Such mismatches will be henceforth 
referred to as "imperfect indexing". It is particularly 
impossible to achieve perfect indexing as the requirement 
differs from individual to individual. 

The main difficulty in mere keyword oriented search 
stems from the problem of synonyms. The two terms "computers" 
and "calculating machines" may be synonymous in some context 
but not always. For a requester interested in modern computer 
systems design, docviment dealing with early calculating 
machine design is irrelevant. Identification of the validity 
of this synonym, independent of the user is difficult. Under 
such conditions a second attribute to the docxmient, say, time 
of publication would be useful. Either the apriori knowledge 
of the user regarding this attribute or interaction with the 
system which might throw light on this will reduce the irrele*" 
vant documents retrieved. 

Extending this idea of multi attribvites , one is interested 
in knowing what are the attributes of the documents that 
should be used in the file organization and query formulation. 
As in any design, there is a trade off between the nximber of 
attributes and their Cost of inclusion. The inyerse relation 
between recall and precision is shown in Figure 3.1, both for 
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single and multi attfribute systems. It should be observed 
that the performance under multi attribute system has improved 
where the ideal performance is unity precision and unity recall. 

Let the documents be represented with 'n' attributes in 
the data base. Consider the following definitions; 

Definition 1 ; The document space or data base denoted as D is 
defined as the collection of N documents where each document is 
characterized by an n -vector. 

Each component of this n-vector describes one attribute. 

For example, a set of keywords or index terms and a set of 
citations could be the two attributes in a particular system. 
Definition 2; Request is defined as a non-null n-~vector possibly 
with null elements. The elements of this vector have the same 
interpretation as in definition 1. 

Definition 3g Answer to a request R denoted as D* is defined 
as 

D* = -Ceij |f [M(R,dj)]} for all e D 
where M is a relatedness measure function and : 

f is a decision ixinction or predicate which determines , 
if a document is to be retrieved or not for the given regues 
In a weighted term search, the relatedness measure is obtained as 
the svim of the weights of the matching terms. If this s\im is 
greater than the threshold t, it is retrieved. The threshold 
T is normally given by the requester or is determined by the 

I 

system on an interactive basis. ' | 
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Definition 4g ]>>n indexing scheme is said to be perfect if 
the precision of retrieval is unity. 

By selecting one of the attributes from a n-attribute 
system, it is possible to derive n different single attribute 
systems. 

Theorem 1; A multi attribute system of information represen™ 
tation and retrieval is at least as good an approximation to 
perfect indexing as the single attribute system derivable from 
it. 

Proof ; An irrelevant retrieval of a document occurs due to 
one of the following reasons: 

(i) inaccurate description of the request, that is, 
what is described in the query is different from 
what is wanted. 

(ii) imperfect indexing. 

Let Pj probability, that the document d^ is retrieved 

as relevant due to matching of attribute j alone but later found 
irrelevant, possibly due to one of the reasons mentioned above. 
Then P(d^), the probability of document d^ being retrieved 
relevant with n attributes matching and later found irrelevant 
would be 

n 

P(djL) = TT P. (d. ) 1 < k < n 

j=l ^ 

l-P(dj^) will be a direct measure of precision. A multi attribute 
system does no bettdr to a query when the equality holds. 
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The intuitive idea of improving the precision using more 
attributes is stated by this theorem. In a practical system 
one would be interested in overall performance of the system 
for a representative set of queries. This could be increased 
by the choice of proper set of attributes and also educating 
the user v;henever possible regarding the attributes not men- 
tioned by him. (See Figure 3.1). The inverse relation between 
recall and precision would result in the loss of recall when 
precision in increased. Retrieval is a trade off betv/een these 
two and on line control is possible for them in an interactive 
systems . 

Theorem 2s For small ch.anges in request space, change in answer 
set D* with multiple attributes is less than or equal to that 
with single attribute under unity precision and recall. 

Proof ; Let D* be the answer set for the request r. Assuming 
ideal operating conditions, namely, unity recall and unity 
precision, any small variation in r, say Ar should not add 
any new document to D* nor delete anything from it. Consider 
the variation in query (request) due to imperfect indexing and 
synonym associations. 

r = r 2 , V 

r + Ar = <rjL + ^2 ° ° ^n ^^n^ 

The probability of adding a new document to D* in a single 

attribute system, say with the first attribute is due to r^+Ar, 

G 

matching with that of a document not in D*. In a multi attribute 


1 \ ft 
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this will be due to combined simultaneous matching of the n 
attributes and this joint probability would be less than or 
equal to that due to one attribute. Similarly one can show 
that the probability of deleting a document from D* is less 
in multi attribute system and hence the theorem. 

However if the variation in request is in only one attri- 
bute then the multi attribute system does no better than the 
corresponding single attribute system. A vrell selected set 
of attributes and user interaction would reduce such cases and 
statistically improve the performance with multiple attributes. 

3.3 A Case Study with Library Information System 

The bibliographic information of documents has been used 
in retrieval. For example, Salton (93) has suggested a scheme 

it 

to pick keywords for describing a document not only from the 
text but also from the cited documents. The citation index 
or bibliographic coupling used by Kessler (53) ignores the 
keyword information and bases citation as the only tool for 
retrieval. Both these methods are not ccmplate when considered 
individually. A multi attribute system of information organizatior 
for a library infonriation system is suggested as follows; 

Every document in the index file is characterized by the 
following five attributes; 

1. Author or creator of the document 

2. Time of publication or creation 

3. Name of the document or title 
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4o Index terms or keywords for description 
5. Citations or subject affinity „ 

The attributes of this system, shortly called. ATWIC, are choson 
from their common usage in manual systems. Different indexing 
schemes currently in use form proper subsets of this multi 
attribute system. 

(a) Author indexing <A T N 0 0> 

{ b) KWIC indexing <0 T N I 0> 

(c) Citation indexing <0 T N 0 c> 

(d) Subject indexing <0 T W i 0 > 

Each indexing has its own merits and demerits. For 
instance, citation indexing is technically simpler but essentially 
keeps track of the information in the past. Also, a user should 
know a representative document to formulate the query. The 
combination of these indexing schemes as a five attribute system 
combines the advantages. When the user describes his needs pre- 
cisely with more than one attribute, it is possible to improve 
precision. Suitable feedback from the system and query reformu- 
lation might improve also the recall. 

Now, it is possible to define a relatedness measure for 
ATNIC system using the last two attributes. Consider the 
following definitions. 

Definit ion 5; For a collection of m documents the ij^^ 
element in "document-document association matrix" i [Bl 

mxm 
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indicates the association between ith document and j-th document. 

®ii ~ ®-ii symmetric measure 

A 

= 0 by definition 

This association measure may be obtained from the common key- 
words, common citations etc (53), 

Definition 6; For a collection of n index terms a term-term 

association matrix, is defined such that T. . is the 

nxn XI 

association between the i-th and j-th index terms. 

Association between two index terms may be obtained from 

their statistical co-occurrence. This matrix essentially defines 

the "closeness" between various index terms (96) „ 

Definition 7 ; For a collection of m documents with n index 

terms a discriminant matrix [A] is such that the ij-th element 

mxn 

is a measure of the association between the i-th document and 
j-th term. 

This association may be obtained by considering the term - 
frequency in the document (96) . 

Definition 8; A modified discreminant matrix [D] is defined 
mxn 

as follows ; 


[D] = [A] X [T] 

mxn mxn nxn 

The resultinq matrix is similar to A, but takes into account the 


synonym between terms in the document corpus, 


Each document is now characterized by the corresponding 
row vector in D matrix. Similarity between two documents d^ and 



can be defined as 


sim (d. 


n 

d,) = S 
i=l 




for dj^ 


Though the above measure does not satisfy the metric properties 
given below, it is possible to define distance measures with such 
property (Appendix I) . 


d(A.,B) > 0 and equal to 0 implies 

Ar = B 

d(Ar,B) = d(BrA) 

d(2^,B) + d(b,C) > c.(A.,C) 

Definition 9;; Iielatedness between documents is the reciprocal 
of the distance between them^ 

Definition 10; Among three documents A,B and C if relatedness of 
A to C is obtained iraplicitely through another document B, then 
A is said to be related to C indirectly. 

Theorem 3 ; An indirect relatedness can atmost be ogua.l to the 
direct relatedness between two docviments. 


A 



D 


Figure 3.2 

Proof; Let the relatedness between tv/o documents A. & B be R(A,B) ? 
then the theorem states 


R{A,B) 


1 


^ R(A,C) + P.(C,B) 

< 1 

” KIA,U; + R(C,B) 


__1 < 1 
RTaTbT - R(A,cy 


+ 


1 

rTcTbT 
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d(A,B) < d(A,C) + d(C,B) 

(which follows from the above definition and metric property) „ 
Hence the theorem.. 


The theorem explains the intuitive idea that when related 


ness between documents is established through a third document, 

there may be unwanted retrieval. However, when the three 

documents are strongly related, that is the equality holds in 

the above relation, it does not introduce any spurious response. 

This argument can be extended to show that this indirect 

of 

relatedness is less effective as the level /indirectness increases. 
An indirect relatedness is said to be n-level if it goes through 
n distinct documents. 

Corollary : An n-”level indirect relatedness can be at most equal 
to (n-l) level relatedness. Proof of this follows directly 
from the definition and is given in Appendix II. 

A similar approach to define a metric using 'C' attribute 
in ATNIC becomes difficult (6) . Consider the following 
relatedness measures. 

Definition 11s A citation matrix [C] for a collection of 

X.lJvXi 

n documents is a binary matrix \mozo ij-th element is 1 if i-th 

document cites j~th docum.ent or else it is zero. 

Then relatedness between two documents, say i and j 

n 




+ C . . + 
31 


E 

k«>l 


'ik 




46 


This gives the nuiaber of comuion citations between i and j and 
the number of times they are cited, together „ This measure is 
symmetric, non -negative but the triangle inequality is not 
always satisfied. But in perfect indexing .scheme observe 
this measure will be a metric. For examiplo, consider the 
following condition (See Figure 3.2). 

A cites B and C. 

C cites B and no two of them are cited together 
Then = 1 

%C ^ 1 
R^B = 1 

Hence £ R^^q + r.(^. 

In perfect indexing scheme when two documents are related 
they will alvmys be cited together. However, in practice the 
citation index refers to ttie documents in the past and would 
also depend on the availability of documents to the writers. 
In multi- a.ttribute systems like ATNIC, a composite measure 
of relatedness considering all the attributes is required. 
Relatedncss V7ith regard, to other attributes in AT'MIC, namely 
author, title or tine of publication is but a simple riatch 
condition. Relevance of a document to the given query can be 
obtained as the weighted sum of the relatedness of different 
attributes used for search. 

3 . 4 Attributes Selection for File Organization 

The desirable characteristics of the attributes in a 
multi attribute system would be ? 
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1. Independence; The attributes should be as independent 
as p'’''ssiLle so that raaxinum information is conveyed 
jointly. 

2„ Use in context; The significant 3:oy'ii;'ords in the title 
are also in the index terms attribute. However j the 
title when presented in its complete form is more useful, 
in particular for user feedbaclc. 

3. Usage" The attributes should be the ones often used. 

This factf-^r justifies thiOir inclusion in the data base. 

4. Compatibility 5 Compatibility v;ith existing indexing 
systems is desirable for f rncilitating file organization 
and data collection. 

The five attributes suggested in Section 3.3 satisfy most 
of these requirements . 

The follov/ing algorithm is proposed for the selection of 
attributes for a data base from a set of m possible attributes. 
Definition 12 ^ Self information of an attribute is defined as 
log qj_ where = 1 “■* p^^ and is the probability of the i- th 
attribute being used in a query. 

Prom this definition it is clear that one may not have 
such data when the file is organized for the first time. However, 
such data can be collected for systems in operation to aid other 
systems design as well as for self evaluation. 

Definition 13s Joint information between two attributes is 
defined as >" log Pj_j where is the joint probability of the 
two attributes i and j in the data base. 
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’■fhen tlio joint probability is unity^ the above measure 

becomes zero and these attributes are samcj as far as retrieval 

is concerned, ’>Then p^j is zero this measure is maximum and the 

tx'ro attributes do not cc-occur at all. 

Let a joint information matrix fC'.r a collection of n 

documents be [I] vhose ij-th 'ilement is the joint information 

nxn 

between i-=th and j th attributes. Then., a method to select 
n attributes for the data base is as follows: 

Step 1; Enumerate all the T'’Ossibly relevant ‘m’ attributes 

of the data base. This m is greater tha.n n^ the number 
of attributes that one wants to use in file organization. 
Step 2 s Obtain the self information of attributes and the joint 
inf(.jrmation matrix I, This might need sampling of the 
datca base when it is leirge. 

Step 3s Choose as the first attribute the one having the maximum 
self information. 

Step 4 3 To chciose the next attribute look along the row (or > 

column) of the I matrix pertaining to this attribute. 

The attribute having maximum joint information with this 
is chosen. 

Step 5 g To choose the third and subsequent attributes one can 

adopt the same rule with the follov^ing modification. Each 

attribute which is already choson can be assigned a ' 

1 

weight based on its self information. Note this self 

i 

information is a measure of the probability of that [ 

attribute being used for query. The weighted sum of the | 
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rov7 (or column) vectors pertaining to the already 
chosen attributes gives n. single vector. The situation 
new is identical to step 4. If the number of attributes 
selected is less than then go to step 4. 

In an informeition system design j, file organization,, query 
language, and retrieval method are highly inter- -relatad. In a 
query language design one V 70 uld like to improve the recall by the 
union of attributes v.rhich will account for insufficient knowledge 
of the user about his needs and. also any variation in indexing. 
Similarly precision can be improved by conjunction of attributes. 
However, it must be remembered that they are inversely related. 
Consider a query as follows s 

^ “ ^1' ^2- ^3'” ° " ®n 

where Sj_’s are such that their joint information considered pair- 
wise is minimum. This V70uld improve precision. If each is a 
union of attributes one can account for combined advantages of 
both. Such a scheme has its own disadvantages as increasing the 
number of attributes v/ould increase the storage and retrieval time. 

Using the above algorithm and the five attributes mentioned 
in Section 3.3 a case study was conducted with a sample data*. As 
the name or title attribute is unique and different .fO'r different 
doexjments in the sample chosen, it is not included in the I matrix. 
Let us consider a hypothetical case wherein each document has one 
author and one index term only. Then the joint probability 


* Articles published in Comm, of ACM, Jan-Dec, 1969. 
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of tile author and infomation content attribute is given by 

No. of documents with and Ii 

Tv ^ r= =t— 

^ l-J-l Total No. in the sample 


Then 


m 

P = Z 
i=l 


n 

Z 

j=l 


w 


11 


“^i^ j 


for m authors and n index- terms . 

This summation is to be normalized to lie in the range (0,1). 

When there are k index tc^rms for c. document, the k dimon"' 
sional vector is converted to a scaler by assigning weights tO' 
these keywords. Probability indexing mentioned earlier (Section 
2.2) is useful in assigning these v/eights. obtained for a 

pair of documents is averaged over the scimple space to obteiin a 
single entry in I matrix. One can define other joint probabilitie 
following the same technique and the resulting I matrix is shcvrn 
in Table 3.1 vAere the unit of information is "nats'* 


Table 3,1 



A 

T 

1 

C 

A 

0 


~j77jr~ 


T 

3,835 

0 

3.849 

4.001 

I 

3,726 

3,849 

0 

3.992 

C 

4.029 

4.001 

3,992 

0 


I ■ Matrix for the case 
study 

Prom previous experience index terms attribute is chosen as 
the attribute having maximum self information and the rest are 


assigned equal probability. This leads to the following ordering 


UJ 
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I,Cj.T and Ao. The experiment concludes that if an information 
system is to be designed with two attributes only, I and C are 
recommended for this data basOc In any practical system it is 
desirable to keep track of the frequency of usage of the attri- 
butes in the system as well as those not in it„ This would be 
useful in evaluating the selection strategy and modify it when 
requiredo 

Having considered some results and aspects of multi- 
attributes in information retrieval, the following sections 
describe the use and effect of user feedback in improving the 
performance of retrieval systems o 
3 . 5 A Model for Retrieval Systems 

An information retrieval system is a collection of indexed 
documents, requests for search txnd one or more retrieval methods. 
The retrieval method associates vrith each document a relevance 
number for a given query. Prom this point of viev/ it seems 
appropriate to use the notion of fuzzy sets (114) in defining 
a retrieval system. 

D efinition 14; Fuzzy Set; Let Y be a space of points whose 
generic element is Y such that Y = {y}. A fuzzy sot A in Y 
is characterized by a ''membership function" f (y) . For every 
mwmber of Y the function f(.) assigns a real number in the 
interval [0,1], f (y) is called the grade of membership of y. 
Definition 15 g Retrieval Process s A retrieval process is an 
ordered triplet <D, fp,(.), R> where D is 


the collection of 

J.t.t. KANPUR 
USKAitY 


At* Ax 
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indexed documents;, ^P. membership function defining 
a fuzzy set D* in D and 
R is the given request o 

In a multi attribute system using n attributes the request 
will be a n-dimensicnal vector and D will be a set of n dimen-' 
sional vectors. The membership function is the algorithm for 
the relateduess measure between a document and the request. 
Consider the following definitions and the theorem duo to 
Salton (96) . 

Definition 16; lui inclusive information retrieval system may 

be defined as a partially ordered information retrieval sy.stem, 

such that for all r, s e R 

r ^ s =>T(r) C T (s) 

where Rs set of requests 

> a partial order relation in P. 

T(''r)s set of documents retrieved for the 
query r. 

Theorem 4; In an inclusive information retrieval system, the 
image under T of a decreasing chain in R is nn increasing chain 
in 2^ (sot of all subsets of the documents) and the image of any 
increasing chain in R is a decreasing chain in 2^^, 

This theorem states the common observation that more 
specific the request -■ or equivalently, the larger the number 
of terms used to fomulate the request - the smaller is the 
size of the retrieved document set. According to this defini- 
tion a standard keyword system is inclusive. The seime notion 
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can be defined using the fuzzy set model as given belov 7 , which 
is not restricted to keyword matching « 

Definition 17 g Inclusive information retrieval is a class of 
requests whose membership functions are identicals 

Two membership functions and £3 ( - ) defined over the 

set X are identical iff 

f^(x) = f^(x) V X e X 

This definition is not restricted to the keyword system » A more 
specific and more generic request is, now represented by a 
threshold. A document is retrieved as a response to the given 
query if its grade of membership is greater than or equal to this 
threshold, A larger value for this threshold means more specific 
retrieval and vice versa. 

Definition 18; relation between two requests r, s e R is 

defined as 

r > s iff T„ > T_ 

^ I. — s 

where and are the corresponding thresholds associated with 
r and s. The relation ^ is reflexive, antisymmetric and transitivei 
VIhen the membership functions are same the request with larger 
threshold retrieves lesser number of documents and vice versa. This 
explains the above theorem due to Salton, 

This formalism, has the advantage of being simpler to 
interpret. Notions of probabilistic indexing can easily be made ; 
part of the relatedness measure or the membership function. 


54 


3„6 User Feedback in. Rotrioval S ystems 

There are two possible ways of improving the performance 
of an information system? namely^ multiple attributes and user 
feedback. During an online? search procedure , information about 
the retrieved documents are displayed to the user. This creates 
an interaction with the user to reformulate the query when 
needed. The display may be excerpts of stored c ictit.narics 
term list, term frequency info^rrantion , related index'* terras , 
titles or abstracts of some selected papers. The information 
about the retrieved and displayed document being relevant or 
irrelevant could be used in modifying the query. Results 
obtained from one such experiment with SRiART system is shown in 
Figure 3,3. It has been found in that experiment that the infer’- 
mation used from the retrieved irrelevant documents in formulating 
the query is quite useful. 

Though the relative merit of the information obtained from 
relevant or irrelevant documents cannot be categorically stated, 
the use of feedback is found to improve the performance in 
general. 

Definition 19 i A feedback retrieval process is defined as an 
iterative process and the i~th stage in the iteration is 
characterized by 

<D^ , ( . ) , R^> 

In any practical information system we can assume the document 
collection D and the membership function <3o not change 



precision 



Figure 3.3(a); Feedback v?ith relevant documents 



Figure 3.3(b) s Feedback retrieval performance 

(for two iterations) (see Ref. 18) . 
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for the given query- Thus the modification in the process 
is essentially in query formulation- Hov/ever for systems 
like on line ticket reservation it is not uncommon to consider 
the modification in D alsv>« 

For a given query let the cm&vjer set from an ideal system 
be D*, Then D* will contain all relevant documents in D and 
all documents in D* will be relevant- Though the precision is 
somewhat easy to measure from the user response it is difficult 
to measure the recall- One v/ay to do this in an interactive 
system vrould be to make a more generic re(^uest and sec whether 

/-Nr 

any new relevant document is being obtained, Hov/ever one has 
to consider the point of diminishing return in this iterative 


process. 


Let Dj. be the approximation to D*, As D* is not knovrn 
apriori, a possible way to decide the termination of the itera- 
tive process would bo to check if 

^i_^i-l__, i-”2 _ i"n . . „ 

"r = “r = '’r ’ ' ’ '^r ^ " 

The equality may be considered not only from the relevant and 
irrelevant documents point of viev? but also v/ith the cost and 
value of the modified answer sct- 

The i' th stage modification of the request would depend 
on the previous stage and its answer set. Similarly the threshold 
for selection will be modified. 


= H{D, 


i"-l n-i-l 


) 
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G and H are called "modifying functions". In general? guide- 
lines can be given to form more generic or more specific queries 
for a particular indexing scheme. But the modifying functions 
G and H will depend very much on the requester and his relevance 
and specificity judgements. However it is felt that a statis- 
tical study of the users in such feudhjack system may be helpful 
in classifying the users into uqui valence classes and character- 
ize these functions in seme sense, iis a modification to the G 
and H functions one can consider the use of past information 
beyond one previous stage in modifying the request at the ith 
stage . 

Definition 20; A retrieval process is said to be bettor than . 

another retrieval process $2 iff there exists aj^ r. for which 

M(D_T,R)i,o 'pre -i tar than v/here D . and D ^ are the answer 

rl' - rz*^ rl r2 

sets obtained for the query R from c;nd S 2 and H is an evalua- 
tion function. 

In its simplest form the evaluation function can be weighted 
sum of the grade of memberships of the elements of D^.. The 
weights assigned meiy be positive or negative according to the 
judgement of the requester. 

Theorem 5 •: A feedback retrieval process is better than the 

corresponding non-feedback process. 

The validity of this theorem can be proved by the following 

argument. Let Dj, be the answer set obtained for the query E 

obtain 

with threshold x. It is always possible to/R’ and x’ which would 
drop out atleast one relevant doeviment or add an irrelevant 
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document to D^, If one starts the retrieval process with 

R' and t' the answer set obtained would be O'*. But by cons-’ 

r 

truction 

R) < 

It is possible to obtain r. from 1/ by propv^rly defining the 
modifying functions. Suppose the indexing is perfect, then 
it is possible to include proper attribute or tr modify the 
threshold in an interactive system to get back the dropped 
out document. 

Hov;ever, feedback might give increased irrelevant docu- 
ments when it is not used carefully. It will be the responsi- 
bility of the user to keep track of the sources of such increased 
irrelevant retrieval and avoid them by careful query formula-’ 
tion in subsequent iterations . The experiments with SIlTir.T 
system and project TIP have shown that on an average 20“’30% 
improvement in recall and precision is possible with feedback 
retrieval (181) . 

It is quite legitimate to think at this stage that the 
multi attribute system being a better approximation to perfect 
indexing should be able to do better in feedback retrieval. 

The probability of the user interacting with the system is also 
more in multi attribute system. This intuitive idea is said 
in the following theorem. 

Theorem 6 ; An n attribute feedback retrieval process is at 


least as good as (n- 1) attribute process with feedback. 
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Following similar arguments as used in multi ‘-attribute 
system with probabilities and the existence of at least one 
request for which it does better eas in the above theorem, the 
proof of this theorem is direct = 

Consider a keyword system with user feedlDack and a system 
vjith keytrords and citations or bibliographic data v/ith vrer 
feedback. The theorem states that the later V7ill perform better 
for at least one request. The experiments have shown in keyword 
system that feedback improves the performance on a statistical 
average,' but it remains to explore the performance of a multi 
attribute system and the choice of attributes for an on«"line 
interactive system. 

In this chapter some results on multi attributes and user 
feedback in information retrieval have, ibeen discussed and earlier 
results obtained in the literature are restated. The main 
motivation is to indicate the growing need for specialised compu*- 
tational power in automatic information retrieval. In this 
context, it is worthwhile to exploit the special nature of the 
information retrieval problems and design a special purpose 
computer system, well matched for this area of application. With 
growing needs in diverse areas f;.f information retrieval such 
system design will be of general interest to a vjidening class 
of users. In the succeeding chapters we will explore this 
specific problem in detail. 
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CHAPTER IV 

COHPUTSP SYSTEP* ARCKITECTUmT 

4 . 1 Hierarchy of Operations j.n a Ret rieval Process 

The th.re <2 levels of an inf creation retrieval process dis- 
cussed earlier., namely j, query pre-processirig . retrieval and, 
dissemination have different requirements. For instance, the 
'‘matching" required in the first level is not as much as that 
in the second level. In a typical case the first level processor 
may analyse the query and determine the physical files to be 
searched. There may be a synonym dictionary and a file directory 
to be consulted in determining these physical files. As the 
system will be used in interactive mode the first level processor 
will have the facility for time sharing. 

The requirements of tlie second level procossors are different, 
The coiranunication with slot*? user terminals are taken care of by 
the first and third level proccsscrs. The main function of the 
second level processors is to select the set of pointers to the 
relevant docximents for the given query. This would involve the 
input from the index file, internal processing to select the 
relevant documents and Gommunicating the selected pointers to 
the third level processor, hs the selected files for search 
can be examined in parallel there may be more than one second 
level processor in the system. In such cases the number of 
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processors allotted to one query, at any time v;ill depend 
on the nuraber of available free prc'cessors and computational 
requirement of the queries awaiting service Unlike general 
purpose time sharing system (28) it is possible to estimate 
the computational requirement more precisely for each query. 

The Siam of the expected processing times of- the files selec- 
ted for search, in response to a given query will give this 
value. 

The third level processor has comparatively loss matching 
and computation to perform. The pointers setected by the 
second level processors are used to pick their corresponding 
texts or abstracts from an abstract file. The selected excerpts 
are then presented to the user. For an interactive system it 
would be preferable to have a soft copy display. The complexity 
of this third level processor might vary depending on the 
allowable cost. For example, in its simplest form the pointers 
constitute the "'call number'' of a selected document or 
the bibliographic reference to the soloctod documents and they 
can be directly presented to the user. The hard copy facility 
may provide the abstracts of the selected documents on request. 

It is possible even to store the entire text on computer operated 
microfilms (COM) (2) and provide the user with a hard copy of the 
selected document. In the simplest electronic catalogue system, 
operating in an university environment the soft copy output of 
the pointers to the selected documents might be sufficient. 
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Considering the requirements of these three level 
processing y a computer system with hierarchy of processors 
is proposed o In this system there can be more than one 
second level processor operating concurrently o The first and 
the third level processors being limited in their functions, 
it is possible to consider them as sp>4.cial purpose time sharing 
systems „ Such an approach would minimize the total cost of 
the system. The overall organization of the proposed computer 
system is shown in Figure 4,1. The inter processor control 
and data communication among the second level processors is 
possible only under the control of the first level processor. 
When such inter processor a.ctivity is low as in information 
retrieval problems, the overhead will not be prohibitive. The 
unibus shewn in Figure 4,1 constitutes the control and infor-- 
mation path between the first level processor and the second 
level processors. 

4 „ 2 Requirements of a Computer System for I nf ormation F.etrie val 
The need for interactive processing as discussed in Chapter 
III for information retrieval process requires the first level 
processor to be a time sharing system. The time sharing is the 
simultaneous employment of a computer by several users and 
each one has the illusion of being the only person using the 
machine. The main hardware and software features of a time- 
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Figure 4.1s Computer system with a hierarchy of 
processors. 
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sharing computer arc as follows (82) s 
Essential hardware features; 

(a) Memory Protection 

(b) Interrupt feature 

(c) Built in clock 

(d) Dynamic program relocatic-n 

(e) i':.uxiliary Storage 

(f) Typev^riter and display consoles.. 

Essential software features: 

(a) A supervisory program for momory allocation 
and management in general 

(b) A command language for user communication 

(c) A debugging language for on line debugging 

(d) A symbolic programming language to simplify 
program writing . 

The above requirements of a time-sharing system have been 
considered for general purpose computing in which problems 
of different kind are solved by users (100),. A special purpose 
time sharing system for a particular application would need 
much less effort in the operatiiirj system developiacmt^ Fo.r 
instance, in the time sharing system of the proposed architecture 
one would be restricted to problem oriented query language. 
Developing a processor for a query language for information 
retrieval would be much simpler compared to compiler type 
languages like FORTRAN or ALGOL. For these query leinguages 
the debugging system would also be simpler. In such a system 
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the users at the console will be using a problera orientec* query 
language and symbolic programming v^ill be reserved for the 
system implementation and modification. 

However, the hardware requirements mcnticned earlier are 
necessary even for the special purpose tir-se sharing systems. 

The decreasing cost of hardware v^ith integrated circuits have 
shown that the major part of such systems would be governed 
by the cost of the softv?are (55) and hence tho total cost of 
a special purpose time sharing systora vrould be less. The 
complexity and the functions of the third level processor 
are much different. The pointers selected by the second level 
processors are used to pick the corresponding abstracts and 
get them displayed at the appropriate console. In other words 
this processor V7ill be executing a fixed program all the tim.e« 
For example, this processor may have a. COM file and a fixed 
program to do the following ;; 

(i) To collect all the pointers selected for a 
particular query. 

(ii) To pick the.- appropriate abstract from the 
abstract file. 

(iii) Logic required to switch from ono user 
terminal to the other. 

The nevr product announced in (115) has a disk memory and 
can serve more than thirty of its terminals each equipped with 
a keyboard and a CRT display. This, with a little modifi- 
cation can function as an output processor. When one needs to 
have flexible priority scheme in servicing the teanninals this 
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output processor would also become a programmable time sharing 
system. A fixed priority terminal service and fixed program 
execution seems to be adequate for most of the information 
system requirements. However^ it should bo observed that it 
is possible to completely avoid the output processor and 
endovr this power to the second level processors. For example ,1 
in simplest "electronic catalogue” systems where only accession 
numbers or the bibliographic details of selected documents are 
presented, it is possible to make the second level processors 
to do this display function. But the communication with the 
slow output terminal would degrade the performance as the 
fast second level processors would then be waiting for the 
completion of the output function,. In this architecture such 
communication with the slower terminals are separated from the 
second level processors. 

4 . 3 Architecture of the Proposed Syst em 

The flow of information betv/eon various functional blocks 
in the system are as shown in Figure 4.2. Essentially there 
are two phases?. 

(1) Information organisation and stfjrage 

(2) Information retrieval. 

These two phnses of operations could be mutually exclusive for 
a library information system. In other words, during the 
short period of file updating query processing will be suspended. 



Information path 
Control Signal path 

Figure 4.2s Control and information flow in the 
hierarchical system. 
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user logs in frora any one of the free terminals as in 
any time sharing system.. The central TSS {Time Sharing System) 
responds to the user request aftci: analysing his pass word. The 
query processor residing in the core of the TSG is called for 
service when tho query from the termincil is comnl^itely received. 
This program in association with the dictionary of synonyms and 
the file directory determines the set of physical files to be 
searched for the given query. The file directory would depend 
on -the file organization and tho nature of tho physical file. 

For example^ in a serial file organization and tape file system 
the directory would give the number of initial files on the 
tape to be skipped to get the physical file needed. During tho 
retrieval phase the TSS is executing the same fixed program 
with different data which arise from the terminals. Thus it 
is possible to code this program in reentrant fashion (43) . 

The information generation phase is the one during v/hich 
information items are added to the files, Tho flow of infor- 
mation in this phase is shown Vi’ith double lines in Figure 4.2. 
During this phase, tho second level processors could possibly 
be used for automatic information analysis. 

After determining the set of ".hysical files to bo 
searched for a query, the TSS allocates a certain ncanber of 
free processors to this job. Unlike, computations v/ith a 
general purpose computer, a quick estimate of the processing 
time is possible. The processing algorithm and the 
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size of the physical file are fixecu This copibined v;ith the 
number of keys in the query will give an estimate of processing 
time required per query o Pcsed, on this., the scheduling routine 
allocates one or more processors to tho j:.ib as availableo 

The TSS has a list of pointers to the processors waiting 
for a task. When an assigned processor completes itstaoJ: , 
it interrupts the T£S and this free processor is added to the 
above list. When tiiis list is empty the jobs vrait for processors. 
The TSSj sends control signals to the selected processors. These 
control signals establish the connection between the. TSS and 
a second level processor for data communication. The attributes 
of the query alongwith the files to be searched, are then passed 
to the P.E. (Processing Element) and stored in their memory. 

The P.E.’s become autonomous thereafter in completing their 
functions. The given query is used in searching the assigned 
physical files. The set of all selected pointers are sent to 
the output processor memory. 

The output processor would have a common back up storage 
accessible to the P.E.'s (through their output channels) as 
well as to the output processor, Cn this back U’"- storage each 
terminal will have an assigned user area for storing the 
selected pointers , When there is onci output channel for each 
processor receiving information in parallel, the cross bar 
switch shown in Figure 4,2 is not necessary, Hov.rever, when tho 
number of processors is large and the simultaneous transfer of 
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pointer information is less probable, a small number of output 

channels will suffice „ The cross bar switch then provides 

connection between any c^f the output channels and a second level 

processor. The output processor might do some • 

-“■‘-ocQssing on 

these selected pointers. For instance, it might give the count 
of the number of documents retrieved or might present the 
documents of higher relevance first. 

In this architecture, that i-'ort of the ^ ■ , 

t ~>jcessing which 

needs user interaction and deals vjith slower terminal d'-’vic'^s 
are separated from the rest. The PE“s have comparatively lo'-^' 
time spent in input and output. The internal processing by the 
PE’s for retrieval could be simple keyword matching or matching 
structured information like graphs (23) which form the two 
ooitimonly used data types. Considering the flexibility in 
implementing various algorithms with these data types the PE”'' 
are made programmable. However, the memory to store these 
prograiris and the instruction set will be much smaller Some 
aspects of the design of a typical processing element are 
discussed in the next section,. 

In summary, the hierarchical computer system has the following 
major subsystems.^ 


(i) A time sharing system tc process the regue^’t 
for retrieval and a software query processor^ 

(ii) A processor allocation program 

(iii) An unibus used for communication between p f 
and the TSS, 

(iv) A set of dedicated processing elements canabl 
of concurrent operation, ^ ® 
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(v) A cross bar switch to connect any of the 
PE's to one of the output channels » 

(vi) An output processor to receive the pointers 
selected and display the information » 

(vii) An abstract file and an inde:< file„ 

(viii) A set of user terminals v/ith a keyboard and 
display or typev^riter whichever is feasible = 

(ix) Update programs for information storage files = 

The TSS and output processor with human interaction for 

time shared operation would pose no nevor probleias (71). On the 

other hand restricting the design of these two time shared 

systems to their special task would considerably reduce the 

cost of the total system. However, the considerations in the 

P.Eo design would be different andaJf® discussed in the rest of 

this chapter. 

^ Some Considerations in Computer System Design 

Any design is an iterative process and strikes a trade 
off betv-7een the needs of the user, called system architecture 
and the needs of the fabricator, called system engineering (19) . 
The system architecture of a r„E„ is discussed in the following 
which includes engineering considerations, go that the design 
will be economical,, 

Mode of Operat ion; The r-„E. has the following modes of 
operations 

(i) Run complete mode 

(ii) Run interrupt mode 

(iii) Program mode. 
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In the firet mode the PcE» completes its function by completely 
processing the assigned physical files. ht the end, of this 
operation it interrupts the first level processor and joins the 
list of free processors. In the second v;hen a response is 

found for the given request it automatically interrupts the TSS 
after coirmunicating th;*' selected pi'jinter to the output processor. 
The interrupt service routine in the TOP •“■dds this processor 
alongv/ith the other processors allotod. to this terminal to tho 
list of free processors and the query processing is complete. 

This mode of operotion is useful in “patent search'’ and so on. 

In program mode^ the assembled program to be executed by the 
P.E. is loaded into the program memory of the selected P.E. 
from the first level processor. When tho number of such programs 
are not many it is possible to think of fixed programs stored 
in magnetic cards . In addition to this external interrupts i 
the P.E. has I/O interrupt facility. The I/O interrupt scheme 
is much simpler than a general purpose system as the I/O devices 
are fixed and less in number, Tlix.; input to the P.E, comes from 
the index file and the output is to the output processor, 
tfe mory requirement ■■ Thorc exists a definite speed mismatch 
betvjeen the faster CPU of the P.E.s and the slov/er back up 
memory in V7hich the index file is stored, A primary memory of 
proper size w-^uld alleviate this speed mismatch. The two 
conflicting requirements in the selection of the memory size 
for the P.E. are the cost and I/O boundedness. When the 
processing time of one core load of the memory is smaller as 
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compared to the access tirae of that information from the 
back up file, the P.Eo is said to be foouncled. by input. 

Similarly one could define the output boundedness, Hov/ever 
the possibility for output boundedness is small as the 
information transferred is low. The preliminary mor'-'ry design 
would need an estimate of the average processing time per 
query per document, average access tix-i,e of the file and 
the storago required per document, A modular organization of 
memory (19) in banks would be useful not only in accomodating 
later expansions but also in using the memory banks for simul" 
taneous processing and I/O transfer (33) , It is to be noted 
that the memory size selection should be to exploit the full 
processing pov^er of the P,E, by avoiding the I/O bounded 
condition. Memory size required, using the data obtained 
from a sample experiment is discussed in the next section, 

C . ? , U o P .e qui r emen 1 1 As.rchitectural design of the CPU is 
intended to be oriented to end-use and proceeds backwards. 

Some factors to be considered in this design are in sequ-ejL ; (26) 

1, Data type formats 

2, Word size 

3, Address spaces 

4, Registers and uses 

5, Minimum operations set 

6, Required address modes 

7, Instruction set 

8, Instruction format. 
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Data types; Data types commonly encountered include ? integers 
real numbers, individual characters , English text,, symbol strings, 
pictures and directed graphs. Basically the data in information 
retrieval problems are Vc.riable length records. Often various 
ceding schemes are used to code the descriptors into a fixed 
length code (87) . The number cf such code words per docuiaent 
is variable in nuraber. The common data types used in these 
problems are integers, characters and graphs, 

Word Size; hll the third generation computers have word 
organized structure v/ith the facility for character handling. 

As the first iteration the lower bound on the word size is 
obtained by considering the range of data values. The upper 
bovmd on the nxamber of bits per word is by the economic constraints 
and average utilization of the memory. In this case the P,E„s 
are dealing with indexed and coded files and there is no need 
to have extended character set. A word size of 18 bits giver, 
a sufficient integer range for storing the pointers in one word 
for a file as large as 256 thousand documents. The recent 
trend in LSI memories (81) shows that the c.-jst of memories vary 
linearly with vrord leiigth. Also tV70 words (36 bits) when used 
jointly for a keyword provides sufficient range for coding the 
various keys in the system. The ur'G of such i^ord length is cor’men 
in some low cost mini computers (1U3) , However, a 16 bits or 
12 bits word length with multiple word length format also vrould 
satisfy most of the requirements. Since the keywords are 
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stored as fixed length codes for ease of processing,, the packing 
density of the memor;5' vrill not be affected when it is a sub- 
multiple of the code vrcrd lengths On the other hand a larger 
word length memory reduces the number of memory accesses-, 
the vrord length is too long^ hov;ever,. packing of code vrords 
will be necessary for effective utilisation of the memory. 

Address Spac es s In this part of system design distinct address 
spaces or areas of storage care defined and the data types mapped 
into them. Unlike the general purpose computers the P.E.s operate 
with restricted type of programs 'which are changed only under 
the control cf the system. In the limit these programs can be 
vjired in the processors as discussed in Chapter VI. Hence there 
is no need for a large address space for programs. However, 
programmability endows some flexibility when they are used in 
different information systems. 

Registers s The recent trend in computer system design is tov/ard 
a multiplicity of general purpose registers rather than a separate 
index, arithmetic, I/O and extension registers (11). However, the 
complexity of the control unit design with such general purpose 
registers is much more (88) , The use of multiple registers in 
the C.P.U. in matching operation cud handling variable length 
data would be more useful and they are discussed in Section 
4 . 5 , 

Minimvim Operations; The basic set of minimum operations on 


the data types are summarized here. This ultimately will evolve 
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into an instruction set. The P,E,3 need to have the following 
operations; Integer arithmetic, matching, and. Boolean operations. 
Addr e s s Mode s ; The same argument of restricted type of program 
execution and dedicatedness of the ?„E,s leach to the choice of 
direct addressing. Indexing, base address displacement, indirect 
addressing, addressing through registers, are all nice, but are 
not always necessary (26) , 

Instruction set and form at; Defining even a first-“Cut instruction 
set and format is an elaborate art. This process is crucial to 
computer architecture, .P,s a means to modify or to incorporate 
new instructions, microprogramming is suggested in the literature 
(48), However, microprogrammincj has its own disadvantages. Con- 
sidering the main operations involved and the factors discussed 
earlier the following types of instructions are proposed in the 
first roxind;; 

(1) Load registers from memory 

(2) Store in memory 

(3) Integer Add 

(4) Skip on conditions 

( 5 ) Match 

(6) Branch 

(7) Subroutine branch 

The above factors discussed in designing a computer 
system in general are reconsidered for the P.E. design in the 
next section using some special notations , 
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4 „ 5 PMS anci ISP Description of the Architecture 

A computer system is complex in several ways.. There are 
at least four levels in which the complex system can be 0.escribed 
1» PMS (processor;, memory svjitches) level 

2, Programming level (Instruction-Set processor level) 

3. Logic design level 
4= Circuit level „ 

Each level arises from the abstraction of the levels below it. 

The PMS and ISP are the two descriptive no:>tations developed 
by Bell and Newell (12) to describe the computer system in the 
top two levels.. In this., the informal notations in use for 
describing the computer system are made consistent and more 
powerful. .More than thirty computer systems liave been des- 
cribed in this notation by Bell and Newell (13) and is chosen 
here for its compactness. 

The BNF description and semantic interpretation of these 
formalized notations are given in the book cited earlier 
and Appendix III gives a brief introduction to the notation. 

The various components of PMS level descrij;>ticn are given the 
notations as shown below s 

1, Computer (C) 

2„ Processor (P} 

3. Data (D) 

4. Transducer (T) 

5. Control (k) 
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6„ Switch (S) 

7 . Memory (tl) 

8= Link (L) . 

In ISP level, notations have been c'eveloped to describe the 
following s 

1„ Data types 
2. Ins true tic.'.ns 
3» Operations 
4„ Pre^cessors 

Description of a computer system in PMS level is the 
detailed study of the memory, processor and various switches 
in the system and their characteristics.. We will, nov; consider 
the memory size requirement of a typical PoE, in the proposed 
architecture „ 

Let 

s = the average nmnber of instructions 
executed per second by the P,E» 
n = the number of words in the P,E„ memory 
m = average number of words required per query 
k = moan access time of the index file 
t = trmsmissicn time per word to memcry.. 

Then 

= k + nt 
s 

would make the processing time equal to the input time. When 
there are two memory banks each with n words , they could be 
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used in the same way as swinging buffers are used in I/O 
channels „ This in turn avoids input bounded conditions „ 

Number of vrords in raeinory^ n = — 

(in - St) 

For the value of s as 500,000,' a as 10; k as 30 milliseconds 
for a typical disk monitor system (11) and t as 2 micro 
seconds^ the memory size required would be about 2K words = 

In the PMS description of a P„Eo t^ao bconks of mcrnfiry with eroch 
P»E„ vjill be assumedo The cost of such a. raemnry size does 
not seem to be prohibitive as the cost of lsi memory is 
decreasing (98) „ 

The P14S description of the complete architecture is 
shown in Figure 4^3 at the structure level and. expanded in 
Figure 4„4, However, the components descripti'^n of the first 
and third level processors are not shovra. The number of 
simultaneous terminals to be handled by these time sharing 
systems would depend, upon the average arrival rate and the 
average time the user stays in the system. The PMS descriptions 
of such time sharing systems and their design methods are 
discussed in literature (11, li) . 

The PHE diagram of a typical second level processor 
is shcx'7n in Figure 4„4(b), It co.nsists of a processor whose 
ISP description is given later. The data storages and 
are two memory banks. The sv/itch E 2 connects either of them to 
the index file for information transfer. The control, 
connects the processor to the common data bus which forms the 
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Figure 4..4(c); TKS diagrejr for third level processor 



central broad-cast line for information transfer from the 


first level processor. For instance., each P.E, can be 
associa.ted with a ujiique identification code.-. On receiving 
this identification code the af;'r;r''r'rir? te is connected 

for further information transfer thr-^ugh data bus. The 

control Ko and link L establish the connection between the 
P.E, and the output processor, iMl . E , r.llfvbted- for a 


single task will be connected ti- tlte soiVic out'^at channel. 

This connection between the P,E,s and output channels is 
established by c\ cross bar like switch under the control 
of Kc and is diroctly controlled k-y (Figure 4,5) „ 

The M constitutes the primary memory for the P,E, and 
is used for storing the programs for execution. These programs 
can be assembled in the first level processor and stored in 
these memories at the beginning, A memory v/ith IK words would 
be sufficient for storing the fixed programs and the query 
attributes. When the nuraber of different pr'-'grnms executed' 
by the r,E, is not many,, they can bw steered in fixed memories 
(72) and query attributes might be storec-l in r'.-n or H„, 

As the index registers v;ould add to tJae hardware cost 
it is avoided, liowevcr., the indirect addressing feature 
is cheaper to implement (2C) and also useful in setting up 
loops in p-trograms and is thus included in the ISP description. 
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ISP Descrinticn of a 
Processor Stater 

A <0sl7> iAccuraulator register 

L One bit extension for accumulator 


R 


for overflcvr and carry. 

One bit indicator « It is ON when 


Interrupt disable 

Interrupt type 1/ITl 
Interrupt type 2/1.12 

Interrupt type 3/IT3 

Interrupt type 4/IT4 
Program i-Iode/Pili 

n 


the j'j.rogram is oxacutecl. 

lo when interrupt is to be disabled. 

Program controlled. 

1 t'/hen thve P„E. is interrupting the C^. 

1 when the P.E. is interrupted for 
input from 

1 when the P.E. is interrupted for 
output . 

1 when the P.E. has to interrupt the C^. 
1 vjhen the P.E. is in program mode 
and controlled by C,^. 

Hode selector to select one of the 


modes of operation discussed earlier. 
PC <0;13> Program counter 

BI <0'il> - PC <0,il> hencry bank indicator 


Memory State s 

Mp [0;1777g] <0sl8> Memory bank 0 

[0;3777gl <0sl8> Memory bank 1 

fig [40000g ; 77777 g] <0 ?18>Meraory b.arik 2 


*18 bits/word has been used for the 16P description. 
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Instruction Format;' 

Instruction/i < 0 ; 17 > 

OP <0s5 > ?= i<0 5 5> 

Indirect bit/ib ;= i< 6> 

address/addr <0:;10> ;=i<7:, 17> 

Address Calculation; 

Z <0;13> ;= ( Effective .tiddross 

( ib Z’'-' 1 Direct addressing 

ib IJ(Z'')) Indirect addressing 
Z*' ;= BI 2 adClr Memory bank nnd address 

selections „ 

Instruction Interpretation; 

E A ' — 3 1 -»• (instruction -4-M[PC] ? 

PC 4- PC + Iv next 
instruction execution) 

I ^ PC ; PC ^ 1 interrupt disable ^ 1) 

Instruction sot and Instruction Execution Process; 

The following is the tentative instruction set. Data 
types and the operations are given prime consideration in 
selecting this sot and no simulation at the function, level 
has been done. 

get (A ^ H(Z])j Clear and add 

Sto (M[Z] A) ? Store 

tra -»■ (PC Z) ? Transfer 

tmi (sgn-^ (PC Z))? 

Sgn ;= A <0> 


Transfer on minus 



88 


sbr -^dUZ] PC;, next PC z+1) subroutine jump 

Sts (M[Z] Subtract one from storage 

skn (Z’ -3- PC -f- PC+2) " Skip o.n storage minus 

Z PC +• PC+1) 

Z' ;= M[z] <0> Sign bit of the storacje 

ac'dressccl 

add -»■ {A ■<- A + M(z)) addition 

sub (A +• A ti(z)) subtraction 

skE similar to skm but 

Z' 5= !■ © il[z] skip on equal 

skZ -> similar to skm.; but 

Z’ s= (A=0) skip on zero 

and -»■ (A A A M[z] ) i\ND opereition 

Ibi (BI •<- i <16., 17>) Load base register for 

memory bank selection. 

sfl A i- 2^ Shift loft, n times 

.cfr A + 2^ Shift right n times 

The instruction set given hero is not complete by itself 
but cc'uld evolve a.s a final .set vjith further function level 
simulation of the P.A.s. 

Consider the following set rf multiple registers and 
''associative''' matching as part of the P,E» which would be 
useful in keyword matching. Suppose the processor consists 
of tv7o sets of registers as shown in Figure and. denote 

them as R and D. The R registers htdd the query descriptors. 
If the number of registers in the set R is insufficient to 
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hold all the descriptors of tha query,, only part of it x-zill 
be in R registers. Similarly the D registers hold the des- 
criptors or the attributes of a document. Then it is possible 
to envisage the following instructions fo.r r::itching „ 

When a match paroillel instruction (I'iPL) is executed 
register is matched with bitx;ise .and if the match is 
successful the corresponding match bit (one per d.^) is turned 
ON, The match parallel complete (IIRLC) instruction does 
similar matching for all possible pairs. For example, when 
there are three registers in R and D sets, in the first cycle 
of this instruction D^) , (F3?D3) will be 

matched. In the second and third cycles the following pairs 
are matched. 


Second 

cycles D3) 

(P2 og 

(P3 

Do) 

>3^ 

Third 

cycles {r.3^ 03) 

<I-i2 Dj) 

(P3 

Dp) 


?j,t the end of this instruction execution the: match bit of a 
request descriptor is ON if it matches with any one of the 
document descriptors in the D register. This will be useful 
in keyword matching and achieves the effect of an asc'^ciative 
memory in a limited sense. 

The above par.allel loatch schevae can be used, vjith the 
multiple load instruction. The multiple" or multiple load 

(GETM) instruction loads the R or F registers from a list 
having variable number of descriptors stored in the P,E, memory. 
VThen the list is complete which indicates the end of descriptors 
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in the query list or document list^ a ccrrcspond.ing indicator 
is turned ono Conditional skip instructions vjhich test these 
indicators v/ill be useful in this context to handle variable 
length data« Jin instruction for testing the match bits of the 
R registers and to accumulate selected vreights would realize 
the weighted term search much efficiently,, This ''associative" 
match corabined with the rotary file di.scussoc in Section 5„6 
will be suitable for variable length data h.and.ling, 

4 „ 6 Zibout the Architecture 

In the above architecture any intwrorocessor communication 
among the second level processors is under the control of 
the first level processor. The need for such communication can 
be minimized by judicial file organization. For instance ^ 
consider the ATNIC system described in Chapter III. One way 
to organize the files is to use inverted file orgaaiization for 
each attribute ,and let different processors se<?.rch for different 
attributes. This would need heavy interprocessor communication. 
On the either hand a sequential file organization or combined 
file organiza.tion , where all the attributes of a document are 
handled together by one processor would avoid, such inter 
processor communication. However, the s< 5 arch for a particular 
item as in the case of patent soarch would need inter processor 
control. When one of the processors assigned for the job 
finds the required document, all the processors assigned to 
this job are to be suspended from further processing for this 
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jobo This inter processor control is accomplished through 
the first level processor. 

An iraportcint advantage of thc3 architecture proposed, 
and the autonomous nature of socond level processor is its 
adaptability, A r.S. vjith one particxilar type of design can 
work along'v.^itb another r.E. desip..ied. totally in a different 
way. For instance p the use of associative ).ne.n.ory may be made 
part of the now I'.E.a rxlded to the system and the structure 
of this nev7 P.E.s will not affect thi^s functioning of the 
other P.E.s. 

Another advantage of the modularity in this system is 
the reliability obtained through oquiprf'.ent reconfiguration,. 
When a P.E. fails ^ it can be automatically excluded from 
the system for servicing. This results in degradation in 
service but the system is fail safe. This feature will be 
valuable as large nationwide information systems and informn" 
tion networks become popular. 



CHAPTER V 


SIMULATION OF THE PROPOSED ARCHITECTURE 

5.1 Some Problems in Simulation 

Simulation and analytical models constitute the methodo™ 
logy, in general, in analysis, design, and evaluation of sys- 
tems. The behavioral characteristics of a system for different 
values of the parameters which characterize it can be studied 
through these models without actually building the system. The 
increasing trend in using digital computers for simulation of 
various deterministic and stochastic systems have led to the 
development of many simulation languages (20) , The simulation 
language GPSS (44) , developed for queueing system simulation 
and CSL (107) for continuous system simulation are few examples. 

Often simulation is a time consuming task. Whenever the 
simulated system is structurally very much different from the 
simulating system, this problem is severe. For example, the 
simulation of a multiprocessor system on an uni-processor 
computer poses a number of problems (49) . The objective of 
the simulation is always to be kept in mind in designing a 
simulation model, A function level simulation with all regis- 
ters and counters of a multiprocessing system will be much 
more involved than a system level simulation which is an 
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abstraction of the f -unction level behavior « 

Analytical models developed using probability theory 
queuing -theory (28) and Markov processos (21) have some use 
in the analysis of computer systems, Hovrever, the analytical 
models are approximations to physical systems and are limited 
by the techniques available for solving -them. It is possible 
to simulate a system to a larger detail at the cost of simu™ 
lation time on a computer. But, it is not always possible to 
model a system to any finer level and analytically solve them. 
Analytical models are less time consuming and favourable in 
that sense. The results obtained in a closed forra are much 
easier to manipulate and use. 

In the simulation of multi -processors the following diffi" 
culties are common (49) s 

(i) Degradation of ability to differentiate between 
times of event occurrences, 

(ii) Simulation of simultaneously occurring events on 
a sequential machine. 

(iii) Development of a language for describing the jobs 
to be processed by the multi- processor system. 

In most of the event-oriented simulation, a monotonically 
increasing master clock is used. Event times are calculated 
during the course of the simulation and added to the master 
clock time (T^^^^) ■> As time increases, the ability to distinguish 
between the times of event occurrences decreases. Hence, it 
should be observed that the event time added to master clock 
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is not too small. (For a binary computer with n bit mantissa 
this should not be less than 1^/2^) . 

The programs to be run on the simulated rnulti«'processor 
system are to be described in an appropriate language. Even- 
tually, this language should be able to indicate the indepen- 
dent sections of a job that can be processed simultaneously. 
The PL/I language uses FORK and JOIN statements for indicating 
the possible parallel computation (8) . However, this problem 
is not severe in the simulator described in this thesis, as 
the various physical files to be searched can be processed 
in parallel. 

5.2 An Overviev; of the Simulator 

The proposed architecture has been simulated at the PMS 
level. The simulator is written in FORTRAN IV. There are tv/o 
characterizations required for this simulations 
(i) Characterization of the user group, 

(ii) Characterization of the system. 

The performance of the system is indicated by the output 
parameters like the response time and its statistics, the 
utilization of various resources and their statistics. Two 

ft 

more parameters of the simulator are identified as follows. 

The 'local parameters' are the attributes of the system being 
simulated. The number of processors, input channels and 
output channels are some examples. The "global parameters", 
also known as the "input parameters'’ characterize the 
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operating environmento The arrival rate of users,, average 
number of files to be searched per query and their distribution 
are some examples for the input parameters » 


Input 

Parameters 


Lcicril 

Paramo tors 


Output 

Parar'eters 


Fi'iure Sal.' r-imulator 

A complete list of the parcuneters used in this simulator 
is given in Table 5=1. 

The overall organization of the simulator is shown as 
a block diagram in Figure 5.2. The names of various blocks 
correspond to the names of the subroutines simulating them 
(See Appendix IV) . The arrival of a request to the system is 
simulated by the ARRVL routine which is then processed by the 
PROCES routine. The request is further characterized by the 
nxamber of files to be searched to obtain the response for the 
query. This is done by the GENFII. routine. Allocation of 
jobs to the parallel processors is done by the PROCES routine 
with the help of the SCHED routine,. At the end of the 
processing, the OUTBUF routine is called v;hich simulates the 
output processor as viewed by the parallel processors. The 
above sequence of operations simulate the processing of a 
single customer. A number of such arrivals are simulated 
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TABLE 5.1s List of Parameters in the Simulator 


I, Input Parameters 

(a) Characterizing thp users:; 

1, Average"*^ inter“-arrival time (MJIA)* 

2o Average number of files per request (XMP) 

(b) Characterizing the Systems 

1. Average processing time i-^er core load (Xlyn?) 

2. Average storage access time (XJIB) 

3. Average output tine per file (XtIT) 

(c) Simulation Parameters s 

1. Identification 

2. Number of simulation runs required 

3. Interval limits for histograms 

4 . Output options 

5. Seeds for random number generators. 

II, Local Parameters 

(a) Number of processors 

(b) Number of input channels 

(c) Number of output channels 

III, Output Parameters 


(a) Average waiting time (i.e. v/aiting time of a 
request to get atleast one f reeprocessor for 
service) . 

(b) Average transit time (time betv’eon job entry 
and its completion is defined as the transit 
time) 

(c) Average utilization of processors, input and 
output channels 

(d) Load sharing by the concurrently operating 
elements like processors and channels. 


ll The averages have an associated probability distributionT For 
example, the inter arrival time is exponentially distributed 
with the given value of XMA as its mean, 

* The names inside paranthesis indicate the FORTRAN variable 
names used in the simulator for these parameters. 
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Figure 5,2; Organization of the Simulator 




















99 


to constitute one simulation run and required statistics for 
output parameters are collected. The OUTPUT routine would 
print the various statistics of the output parameters in a 
suitable format (See Appendix V) , 

However, as the system is simulated to study the effect 
of varying local parameters, it is possible to have more than 
one simulation run for varying local pararaeters in the same 
computer run. To facilitate the plotting of the data obtained 
from various simulation runs , the PAROPT routine has been 
introduced. 

As a single simulation run consists of the simulation of 
a few hundred customers (500 in the present case) and a number 
of such r\ms are required with different values for local 
parameters, it is necessary to code the simulator efficiently. 
For this purpose, an event based approach is used rather than 
the clocked sequence system (65) , The simulation language 
like GPSS being slow compared to compiler typo languages, 
FORTRAN has been chosen. Further, the ease with which modi*- 
fications can be introduced in various parts of the simulator, 
debugging facility and the common use of the language have 
motivated the choice of FORTRAN IV. A problem oriented, 
FORTRAN based simulation language like GASP (83) tends to 
become slower due to its being general purpose and hence the 
above simulator has been directly written in FORTRAN. 

One of the main problems with simulation is, collection 
of data. For instance, the simulation of the first level 
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processor in this architecture would need dr^ta about the user 
response time at the terminals, the synonym and file directory 
table lengths, table look up techniques used with them and so 
on. Some of these data can be obtained only when the system 
is operational and the initial design needs a rough estimate of 
these parameters. In this simulator, the first level processors 
are not simulated at the function level and the literature is 
rich with such simulation (39), The requirements of a simulator 
for output processor are similar to that of the first level 
processor. Hence, their simulation is not included here. The 
second level processors, and the index file form the basis of 
simulation. The input to the simulator is the output of the 
first level processor as would be obtained in the total system. 
Similarly the output of the simulator corresponds to what would 
be the input to the third level processor in the total system. 

The core of the architecture as simulated by this simulator 
is shown in Figure 5,3, 

The system parameters to be determined with this simulator 
in the total system design are tlie number of processors, number 
of input channels and output channels (Figure 5„3). These 
three variables together are referred to as '"local parameters" 
of the simulator. The response of the system for different 
values of the local parameters is studied with the help of 
the output parameters and their statistics. By observing the 
output of the simulator, the local parameters are suitably 
adjusted to obtain the desired transit time which might be a 
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design criterion. 

An example for this iterative use of the simulator in 
system design is given in Section 5.4, Conversely, the 
simulator can be used to obtain the possible performance for 
the given resources, namely, the local parameter values. 

It should be observed that the fianctional characteristics 
of the system are reflected in the system level simulation 
through the input pareimeters as the former is an abstraction 
of the latter. For example, the size of memory, processing 
algorithm and the instruction set of a second level processor 
are the factors which determine the expected processing time 
(X!-1P) in Table 5.1, It is possible to observe three categories 
among the input parameters. 

(i) Input parameters characterizing the users or 
the operating environment. 

(e.g.); Ziverage arrival rate and its distribution 

(ii) Input pareimeters characterizing the system at the 
PMS level as selected for simulation. 

(e.g.); Average access time of the file? average 
processing time. 

(iii) Control parameters? v/hich control the options of 
the simulator itself. This is not part of the 
system but of the simulation program. 

(e.g.); Print option? number of arrivals per 
s imulat ion run , 

In the following section a brief discussion of the 
various subroutines is presented. 
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5 . 3 Details of the Simulator 

The simulator described in this Chanter is event'-bascd. 

In discrete-event-'based simulation,, the operations of the 
system are considered at discrete points in time, called 
'events'. Events cause the status of a system to change at 
a discrete point in time, I\n event is said to occur in this 
simulator v/hen a query arrives.. A system clock or master 
clock keeps track of the time of simulation. The time 
interval batv^een two consecutive arrivals is obtained from 
a random nun\ber generator with exponential distribution. It 
has been found from experiments that the number of customers 
using sn information system obeys a Poisson distribution (75). 
But the expected number of customers in a typical library 
vary with the time of a day and it is sho\.m in Figure 5,4 for 
the Library at Indian Institute of Tc-chnology, Kanpur, 

This distribution, further has a seasonal trend. In the 
simulation, however, neither the distribution over day nor the 
seasonal trend has been accounted for. 

The built'in library functions, rilDYl, I.1NDy2 , IIISIDY3 , 

RNDy4 and P>NDy5 are five independent pseudo random number 
generators with uniform distribution in the interval (0,1), 

The linear congruential method is commonly used for such 
random number generation (56) , Suitable transformation (83) 
then gives exponential distribution, used for obtaining the 
inter arrival time, and the number of files required per query. 
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Each parallel processing element in the simulator is 
associated with <?. local clock. For example, each processor 
has a processor clock and so also the channels. The local 
clock indicates the time after which that element v/ill be 
free for new assignment. As pro'-empting among users is not 
considered in this simulation the management cf these clock 
times is much simpler. Consider thtj two cenditiens , Uramely, 
processor waiting for job and a job waiting for the nro-cessor 
cis shovm in Figure 5,5, 

Suppose cin event occurs at time t^^^ as in Figure 5 , 5 (a) 
which means an arrival of a request. The processor is engaged 
until t 2 in processing the task assigned to it at time t^. 
Starting from t 2 onwards the processor is X'^aiting for a task 
which it may receive at time tj^^. Then is the idling 

time of the processor. The Figure 5,5 (b) exhibits a reverse 
condition, namely, the task waiting for a processor. The 
event occuring at tjvj finds the only processor in the system, 
(assume there is only one processo'r) busy till t^^ and has 
to wait for (t^^- t^^) units of time to get a processor. Observe 
that system clock time minus the processor clock time is 
positive in ono case and negative in the other case. This 
technique is used to simulate the concurrently operating 
components of the system. Each clock is a floating point 
number in the simulator program. 


/////////////. 


j System clock 


'N 



Processor clock 


Figure 5,5 (a) Processor waiting for job 




Figure 5,5(b)j Processor' busy condition 
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The asynchronous interrupt signals from the second level 
processors to the first level processor are simulated as 
explained below, A scheduling routine in the simulator looks 
at the clock times of the various processors and picks the 
first available processor to allot for a task at hand. In 
other words if tci is the clock time ^f the i-th processor 
and tjj the time of occurance of the event; k-th processor is 
alloted to the task such that 

(tj^ "tek^ 5. '' ^ci 

1 < i < I’/?, 

when there are I! processors in the system. 

In this simulation the overhead involved in alloting the free 
processor to a task at hand is considered negligible. 

As each event occurs the status of the system is modified. 
The clocks associated with the processors and the selected 
channels are updated. The response time and vraiting time at 
various stages are computed to collect required statistics 
on these parameters. 

The index files might bo stored, on different types of 
physical files. The channels are capable of reading the 
physical files simultaneously. Any one channel can be 
connected to any one of the selected processors. The reading 
from a physical file has been assumed to be fixed to a single 
channel. The reading mode of the physical file may be con- 
ventional in which only one processor receives information 
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from one channel. However by providing periodic synchronizing 
signals it is possible to use what are kncx-7n as "rotary files" 
and a discussion of thera is given in Section 5,6, I'Jhen the 
physical file is larger than the memory of the processor the 
processor has to keep track of the soctions of the files to be 
processed. All processors alloted to a query from the same 
terminal are connected to the same output line,. The cross 
bar switch facilitates the cvonnecticn rf a processor to any 
of the output channels. This arrangement avoids any sorting 
that would otherv/ise be needed at the output level. Since 
more than one processor might be connected to the same output 
channel there is a possibility of queueing. The selected 
pointers are stored temporarily in its memory and the P.E, 
V7aits till the output channel is free for transmitting the 
pointers. The output interrupt suggested in Chapter IV for 
the second level processor v;ould be useful in this context. 

The operations discussed abovo are achieved in the 
simulator through a sot of subroutines as ^hov/n in Figure 5.2, 
The following is a brief description of the essential sub-* 
routines in the simulator, 

MAIN? 

This keeps track of tlxe number of runs with different 
local or input parameters and calls the SIMUL routine for 
each set of parameter values. 
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DATAIH g 

The input data provide values for various input and 
local parameters of the system and they are read in this 
subroutine. There are seven separate and logically independent 
types of data. Any possible subset of these can be read 
without affecting the values of the remaining parameters. All 
inputs to the simulator are in this routine, 

SIMUL g 

This simulates the request by updating the system clock 
(SCLCK) and generating the nximber of files to be searched for 
the query. Events are scheduled one after another in this 
routine xantil the termination conditions are satisfied either 
by the number of arrivals or by the time of simulation. 

PROCES s 

In this, the waiting time of the job or the processors 
are calculated and the processor clocks are updated. It in 
turn calls DRUMAC, for simulating the index file access and 
OUTBUP, for simulating the output conditions at the end of 
which, transit time of a job is calculated, 

DRUIiAC 5 

This subroutine simulates the index file access. Each 
channel has a clock associated v/ith it. These channel clocks 
are used in the same way as the processor clocks for obtaining 
the channel idling time or v/aiting time of jobs for the input 
channels. 
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OUTBUF s 

This is similar to the DRUrihC subroutine but used for 
output channels , 

SCHEDg 

This subroutine allots the processors for tasks. In 
this simulator each physical file is allotted to one processor 
and the processor that becomes free is assigned to the task 
at hand, A job having more files to be searched, automatically 
gets more number of processors than others. It is possible 
to simulate other types of scheduling algorithms by modifying 
this subroutine (28) . 

In addition to the above major routines there are sub"' 
routines for generating random numbers of desired distribution, 
for collecting statistics on output parameters and printing the 
output in proper format. 

The main objective of simulation is to study the effect 
of varying the local parameters on the output parameters to 
obtain an optimal design. Hence, there is a provision in the 
simulator for automatically varying the local parameters over 
a specified range. It would have been ideal to decide the 
next set of local parameters values after observing the 
output paieuneters for the current values at an interactive 
terminal. The limitations of the batch processing system* used 
for this simulation has motivated this scheme of automatic 
parameter changing in the simulator, 

* IBM 704^1401 computer system with IBSYS operating system. 
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Use of the Simulator in System Design 
* » 

In this section,, the output ns obtained from the simulator 

and a method of design are discussec„ Let the throe local para- 
meters, namely, numloer of processors, input channels and output 
channels be denoted as XI, X2 and X3 respectively o For each 
triplet of values XI, X2 , X3 we get a set of values for output 
parameters. Consider the output parameter, transit time of jobs, 
for example. From the time a job enters the simulated system 
till it passes out, there are three queueing points. The job 
has to seize each of these facilities and be served till comple- 
tion. Congestion at any one place will lead to an increase in 
transit time. Providing the facilities in large number is a 
solution but not optimal. One of the objectives of this simula- 
tion is to determine such an optimal selection for the given 
performance requirements and cost. 

By allowing one of the throe local parameters as a free 
parameter wo obtain the output parameters as a function of this 
parameter. Figure 5, 6 (a) depicts one such function where the 
transit time is plotted as 'a function of the numi:;er of processors. 
The initial region of the graph indicates a processor (parameter 
XI) bounded condition. In this rcc;i!.,n increase in XI results 
in decreased transit time. In region 2, the system is bounded by 
the fixed parameters, namely X2, 73 cr both. The increase in 
XI does not decrease the transit time. If the waiting time of 
processors for input and output channels are plotted as shown 
in Figure 5, 6(b), the parameter which is responsible for 
saturation in region 2 can be ic^entified. F^'r instance, b.v 
observing Figure 5,G(b), the X2 parameter is found responsible for 
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Average 

Transit 

Time 



Figure 5.6(a) P.E. 


vs Transit time 



Figure 5.6(b): P.E. vs waiting time 
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Saturation rather than X3 and wo say it is input channel 
bounded. Any further decrease in transit time is possible by 
increasing X2; the number of input channels and not by mero 
increase in XI. The system design is an iterative experimentation 
with the simulator to obtain the least set of values for <X1,X2;,X3> 
satisfying the requirements on transit timo;, cost and perhaps 
the utilization. 

The local parameters. XI, X2 and X3 along three different 
axes and the output parameters,- one along the fourth axis give 
a set of planes in four dimensions. The system design would be 
the selection of values for the local parameters from such plots. 

However, such a plot even with five values for each local 
parameter would need more than tvro hours of computer time even 
with reasonably fast computer and simulator taking one to three 
minutes per simulation rxin. Thus it vrould be advantageous to 
record these performance curves for different input parameter 
values so that the recorded graphs will bo useful as ‘'design 
curves*' for similar system design. 

Observing Figure 5,6 (a) wo note that it may be approximated 
by an exponential function. Tui exprnentiri approximation for 
such graphs obtained from the simulator are made using the 
least square criterion, /.Ithcugh the exponential approximation 
is good in region 1 [Figure 5,6(a)] it would be very much off 
the actual values obtained from the simulator in region 2. h 
nonlinear function approximation with exponential behaviour 
in region 1 and as a limiter in region 2 v/ould be closer to 
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the actual graph but difficult to use in analytical models. 

A linear pclyncinial fit with the logarithms of the ordinate 
gives the parameters •..^f the exponential approximation, namely, 
amplitude and time constant. 

The follov^'ing formulae , cbtained by the least square 


approximation 

(78) are used. 
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The transit time t^. is a function of XI, X2 and X3 and 


can be written as 
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In general: 

fj^(X2,X3) 

T2= f2(X3,X3) 

"'•3“ ( 5 . 1 ) 

where f ^^ ( . ) f2 ( • ) <’-ncl f 3 ( . ) stanc*'. fcr arbitrary fiinctions , 

The reqtairement on tho transit tirie can bo oxyirossoc' by tho 
following inequality 

r. e' '■-^^2 o ”’'^'35<3 <3 (5.2) 

Each of the local parameters of tho system is associated with 


a cost value which determines the total cfjst C'f the system. 
Then the following set of inequalities should be satisfied in 
system design: 


^11^1 ^12^2 ^ ""13^3 1 C 


A e"'^l^l e“T3X3 < p 

^2 2C3 > 1 (5.3) 

The refer to the per unit cost of the processors and 

channels and C is tho total allov/able cost of the system. 

As an example in iteratively using the simulator in system 
design, consider the follov/ing case. T'-'-lo S.". (a) c.-ivos 
the input data or values of tho input parameters for this case 
study. Table 5.2(b) is the list of output parameters for 
some selected values of the local parameters <X1, X2 , X3>. 

The local parameter value <1,1 ,!> shoves that the transit time 
is unacceptably high in an on-line system. Suppose, it is 
required to chose ■^X 2 ^,X 2 ,X 3 > for an average transit time of 
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TABLE 5.2 (a) ^ Data for the case study 
Input Parameters " 

Mean time betv/ean two consecutive arrivals 

= 60 secs * (x;la) 

Mean nuraber of files required per r'icucst 

=50 files (XMF) 
(Distributions Exponential) 

Simulation Control ParaPieter s s 

Nxunber of request per simulation run 

= 500. 

System Characteristics; 

Number of files in the system 

= 100 

Mean time for processing one file 

= 1.5 sec (XI4P) 

Mean time to access a file 

30 sec (XFS) 

Time spent in the output of pointers 

= 2 to 5 percent of „ 


* Names inside brackets correspond to the variable names in 
the program which identify them. See Appendix IV. 
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about 35 seconds. Prom Figure 5,7{n) it is seen that increasing 
with fixed values for X 2 / does not improve the transit 
time to any extent. The queueing or waiting time for X 2 is high 
and the number of channels needs to be increased. As observec? 
from the Table 5, 2(b) and Figure 5„7{a)r it is clear that increa- 
sing any one parameter by itself is not sufficient. 

Hence, the follovTing? procec.ure has boon adopted in usinc 
this simulator for system design. 

(i) Choose a trial value for <x^,X 2 ,X 3 > and obtain 
a simulation run, 

(ii) By observing the output parameter values, choose 
as the free parameter that local parameter value 
in front of which there is maximum congestion. 

(iii) For different values of this free parameter 

observe the output of the system. Vary the free 
parameter until the congestion shifts to one of 
the remaining two local parameters, 

(iv) At this stage, step 2 is repeated. 'Then the 
output parameters reach the desirable design 
objective the procedure terminates, 
applying this iterative search procedure for the 
data given in Table 5.2(a) the minimum value for <Xj_,X 2 i,x^> 
obtained is <5,5,1> and the output parameter values for this 
selection are shown in Table 5.2(b). 




Figure 5, 7 (a); Transit time and v^aiting 


utili- 

zation 



Figure 5.7(b) s Utilization curve 
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The above iterative procedure is the one which searches 
for an optimum value for <Xi, Xj, X 3 > in a multi cUmensional 
plane in order to satisfy the requirements on response time 
and perhaps on cost. The judicious choice of parameters for 
successive iterations v/ould reduce the simulation time as 
compared to the brute force search over the entire plana. 
Observe that a brute force search by starting from <ljlj,l> 
is at least theoretically possible., as X 2 and x^ are all 
integers and bounded. However p a good starting value for 
<^ 1 , X 2 , X 3 > and on line decisions for successive trials would 
considerably reduce the simulation time. 

The average utilization of processors and other parallel 
processing elements decreases as their number increases as 
shovm in Figure 5.7{b),The average utilization of various 
resources when the constraints (Equation 5,3) are satisfied 
would be a measure of goodness for the system design. The 
decreasing utilization curves shown in Figure 5.7(b) also can 
be approximated with exponentials. 

There arc three different inequalities of interest to 
the system design both from the point of viev/ of architectural 
and engineering considerations. They ares 

(i) Transit time inequality called as response function, 

(ii) Cost function, 

(iii) Utilization function. 
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The utilization function could be a weic^htec!. sum of individual 
utilization of the resources and it would be an estimate of 
the running cost of the system. These three functions as ob-" 
tained from the simulator can be approximated by analytical 
expressions. This suggests that an analytical method of approxi- 
mately choosing the starting values for possible. 

A discussion cibout this analytical model is presented in the 
next section. With this as the first estimate cf the values 
for local parameters , the system design procedure would be as 
given in Figure 5.8. 

5.5 Use of an Analytical Model 

The requirements of an information system design is to 
select the set of optimal values for <ri,r^,X 3 >, satisfying 
the cost and transit time constraints of Equation (5.4). With 
the cost and response functions as the tv/o constraints and the 
utilization function as the objective function, it is possible 
to formulate the problem as a mathematical programming problem. 

If the three functions f 2 ^(.), Equation (5.1) 

are assumed to be constants, the response time inequality can 
be made linear. The following set of equations are obtained 
for the data given in Tabic 5,2, 



^ ^ ' 


(5,4) 
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Figure 5.8s Iterative design procedure 
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where <C, C-> are per tinit Cost vectors anc" the constants 

, <kj^,k 2 yk 3 > might depend on the cost vectcr. The different 
constants in these equations are obtained by apprcxiiuatiisg ’the 
graphs shown in Figure 5.7(a) and 5. 7(b). This sot of equa- 
tions can be solved by the methods develoi.xid for nonlinear 
prograivjming (41) . By expanding the exponential functions in 
the objective function in Taylor scries,, it could even be 
solved as a linear programming prrblein (40) . nowcvorf this 
approach has the following disadvantages. 

(i) The distribution among X 2 X 3 > is not accounted 

for. For example; the re.sponse time constraint in Equation (5.4) 
can be satisfied by fixing <x^,x^> as < 1 ; 1 > and increasing X 3 
suitably. But., this is not true in practice. This difficulty 
arises mainly because the functions f ^ ( . ) ,. f 2 (-) and f 3 ( . ) 


have been assumed to be constants. 

(ii) The objective function is not very reliable and the 
sensitivity of the optimal solution for variations in objective 
function coefficients should, bo ccnciderGd. 


(iii) The variables <x^.x 2 ,.X 3 > themsclvo.s a.ro basically 
integers . Solving the problem as a linear programming problem 
and rounding the solution vector to integer may not .be valid. 

Hence a straight brute force search technique has been 
used to find the starting value for the simulator from these 
equations. By considering the service times at the three 
queuing points in the simulation (See Table 5.2(;0) the following 
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ratio between X 2 and x^ has been chosen 

1 


X3_ 5 X2 ; X3 = 2 r 4 


(5.5) 

For different values of <x^j. x^, X 3 > satisfying Equation 
(5.5) j. the response time from the analytical model is calculated 
until the required response time is obtained. If the combined 
cost of <x^;, x^j. exceeds tho total cost constraint before 

the response time constraint is satisfied there is no feasible 
solution. The values obtained, for 


cost vector = <6,.5,1> 

total cost = 100 

transit tim.e ~ 35 sec, 

is x^, x^ , x^ = <4,8(,2> 

Values obta.ined from the simulation 

for the same transit time arer <5,5.,1> 

As the analytical model is an approxir etion to the simulation 
model these tv-ro values are not same. However using 4^8^, 2 as 
the starting value for the iterative use of simulator for 
design, would neecl much less numl;)cr of iterations than otherwise. 
5.6 Possible Extensions to the Sim ulator 

A nurtiber of improvements aro possibl-; for this simulator. 

For example, consider the '-rotary file'- referred earlier in 
Chapter IV, The characteristics of this file is that the 
information contained in it is displayed constantly, whether 
it is required by the external system or not. /3lsc assume 
that synchronizing signal, (possibly a fixed bit configuration) 
is given out by this file periodically. Any processor can 
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start recoivinj the information vrith the start of the synchro- 
nizing signal and vjait for one ''cycle'' to receive the complete 
information from this file. Using such files, it is possible 
to read the contents of a physical file for n-.;re than one 
processor simultaneously. As requests from the processors 
for file access are random, this provides a facility to start 
reading the file from any (logically correct) point and avoids 
the waiting time due to latency delay in the physical devices. 
Figure 5.9 explains the concept of a rotary file. The rotary 
file concept can be included in the simulator vritheut any major 
change. Then the processors accessing the same file will not 
be serviced in sequence. Hence the waiting time of processors 
for channels will not depjend on the service time required per 
customer but on the longest variable length record stored in 
the file. The two disadvantages of the rotary file systems are 
as follows" The synchronissing code signal should not form a 
valid code either for thci pointers or for the keys of various 
documents. Secondly, when the physical file size is larger than 
the memory of the l-.E. the logic required to handle it in 
chunks is necessary for each processors rather than for each 
channel . 

When the number of files is more than the number of 
channels a facility to move the read/write heads is required. 
I’^Jben such movable head devices are used it is possible to use 
this simulator in determining the scheduling algorithms and 
file organization which would minimize the lateral movements 
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4 


Synchronizsing 

signal 


Reading Head 



Figure 5.9s Rotary file. 

(More than one processor can read segments of the file at 

the same time) 
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of the head assembly. 

The processor allocation algorithm used is first-come- 
first-served (FCFS) . It is possible to experiment with other 
service disciplines like priority among sarvicos using this 
simulator. 

The variation in the arriv--'! rate as a functiam of the 
time of day can be simulated by cc'.nsidorin j the simulation 
over a period of time. Aii additionrl provision has been 
made in the simulator to control the simulation by the time 
of (not the computer time) simulation rather than the number 
of arrivals for this puigpose. 


CiHiiPTER VI 


SUPERIJTCSED CODING AND THE 2iRCniTECTOP,E 


6 , 1 Har&rar e Arientec". S e arc h 


The ccpiputer r-.rchitecutre cliacass-GC'. 
of a hierarchy of procGysoro in ^i'ich the 
were designee' for special a.'pl j.catifn » ^ 

purpose sycteia prupoGod for infcri.iatirn r 
restricted to any indexing sch.onx.! Mr tilt. 


in Chapter IV consists 
second level p.rrcessors 
h'''Uijh it is <a spiecial 
•jtrieval,. it is not 
■•n:c;ani 2 ati''‘''n , If a 


particular indexing scheao and. file organisation aro assumed ^ 
then the search can be raado faster and. the cost of a typical 
second level processor would, become that of a few gates and 
registers. Such second level processors have the program 
logic permanently v;ired-in and <are less flexi.ble. 


By making the design highly special •'■urrc'se, it is possible 
to make it faster at the cost of flexibility., Tor instance, the 
computer system discussed in Chapter TV can be used in any 
indexing system, whereas the hardware oriented design discussed 
below is not so. However thelatter would be faster compared, 
t© the forraer. 

The hardware oriented search is based on the coding 
scheme developed in (32) and the parallel access disk file 
(47) . The key words of a document are coded and the infer- 
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mation is packed into a word of sufficient lengths The parallel 
access disk file has 64 tracks vjith a capacity of 4K v;ords (16 
bits) per track = Sach track has a separate read/write heado The 
disk raakes one revolution in about 30 railliseconds . During this 
period the information in the file is constantly being read into 
a register at a rate of about 2c 18 mega bits/sec« 

In the hardware oriented search discussed below ^ the coded 
binary vectors representing the documents v;ould be stored on 
this parallel access disk file. This can he stored sequentially 
along the tracks or in ''cylinder mode"c In cylinder mode the 
bits of a binairy vector are stored in the corresponding bits of 
different tracks and in this mode it is possible to read all 
the bits in this binary vector siraultameously . Hence it will 
be called ’’parallel mode" and the track mode of storing will 
be called "serial mode"- Each of these methods have their own 
advantages and disadvantages. Before discussing the search 
methods, the coding scheme referred earlier is in sequel. 

6 . 2 Superimposed Coding 

In this method of retrieval each document is represented 
by a set of key words, Common endings such as s. ed^ and 
compoxond endings like "fully" are removed. The resulting 
"word stems" are used in further coding. Each record is 
represented by a code word. Let there be N bits in this code 
word. From each v7ord stem an integer is obtained, by adding 
the numeric values of the letters in it. This integer is 
used to choose a random nxmber from the uniform distribution 
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of integers between 1 ancl W. If this value obtained from a 
random number generator is j. then the j ' tli bit in the code 
■word is sat to 1. Thus if there are n keywords (n N) ! in 
an ideal case» there T,rill be n cneo in the coco ’word and M-n 
zeros o However this may not be alw'ays true due to the 
following tv:o reasons. 

lo When the nuiaeric values of letters arc added ^ 
two v-Iifferent word stems having the same set 
of letters will give the same integer code, 

2, The random number generator may give the same 
value in the range 1 to H for tv/o different 
input integers. This again results in non- 
unique coding. 

The spurious match due to non \anigue coding can be con- 
trolled, It has been shown that on optimum value for n is 
obtained when N = 2,2n where n is the nuitiber of keywords to 
be coded (32) , 

The given query is coded in the same ’way using the above 
algorithm as shown in Figure G,l, Then each code v;ord in the 
file is matched against the query code. 1. document is retrieved 
for the given query if the i-th bit of document code is one 
whenever the i-th bit of the query code is one for all values 
of i in the range 1 to N. Each document code has a pointer 
associated with it which refers to the excerpts in the text 
file. These pointers can be used in examining the text further. 



Obtain the word stem for 



key words 

^ 

Delete 

List 


Obtain a random number, 
1 _< R £ N for each 
word stem 


Random 

number 

generator 


Initialize the code word to 
zero • f^et the corresponding 
bits in the code word to 1 


Transfer the code word for 
further processing 


Stop 


Figure 6.1; Algorithm for s upe rimno s cd ' coding 
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, When the probability cf spurious matches is low the above 
method can be directly used for information retrieval. Otherwise 
this method is useful in rejecting the set of docviments which 
are not relevant to the given query. The main motivation of 
this algorithm is to utilise the bitwise oarallelisra available 
in most of the modem computers for the inherent parallel 
processing possible in keyword oriented systems. In otter v7ordSj, 
query terms ..... are matched ■'jith the document terms 

d2f ...j d^.;( simultaneously . 

However this superimposed coding scheme is inflexible. The 
indexing scheme is fixed and the query may have only Boolean AND 
operation on specified keywords. But using this scheme it will 
be possible to search a file of about 64K documents each v 7 ith 
30 descriptors in 30 milliseconds. 

® ° ^ Bit serial and bit parallel methods of information storage 
In both bit serial and bit parallel methods the following 
are the three stages in query processings 

(i) Query from .the requester is converted to super™ 
imposed coding by the first level processor. 

(ii) The indexed disk file and associated logic 

retrieves a set of pointers as response to this 
query. 

(iii) The text pointers are used to access an "abstract 
file" for information display. 
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Bit Seric.l Metho d 

In Lit serial mode of storing the code v^crds all bits of 
a code word are in the same track = Suppose a track has 4K bits 
storage and a code word and pointers are 32 bits long. Then it 
is possible to store 54 code words and their associated pointers 
in one track. The query cede con be stored in a circulating 
registers, Q, Ls each bit of the code vrord is read from the disk 
it is matched with the corresponding hit ch; the query code. At 
the end of the code matching the pointer is gated to the memory 
of the third level processor if the match is successful, Scuh 
matching logic can be provided fer each track of the disk. Let 
q and d be n^-bit query and document codes. Then the following 
Boolean expression M is 1 when they match 

M = (q^+ dj^) , (q^ + 6.2) - • ■> = •l'’‘‘^''n“l'^ ” 

T/iJhen the i"-th bit of the query is 1 and the corresponding bit 
of the docunxent code is 0 one of the Max terms is zero and. so 
the function is zero. When the i-th bit of the query is 0 the 
corresponding maxterm is 1 irrespective of the document code 
bit being 1 or 0, This logic as used in matching is shown in 
Figure 6,2* for a typical track. 

For the kind of disk discussed in Section 6,1, the 

is 

interbit transfer tirae/cf the order of 500 nano seconds. The 

matching logic response time could be ma.de smaller than this 

to avoid any buffering. Typical integrated circuit gates 

have a delay time of 20-50 nano seconds (112) . 

*' For the symbols" use rTTh”' this' logic diagram see"7ppendix' 


! 





In forma 
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igure 6.2: Matching logic 
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Legend of Figuife 6 » 2 ^ 

Initial Statue 

Si register holds the corDplemerit query crd.f: Q 
FF2 initial ott. 

Kl, K2 are initially resets 
Latch Ll i,^ triggerec. 

Z — lo "r ^ 

PFls is set to 1 or zero as 1 cr 0 ia road fri^'in disk 
(document code bit) 

PP2 ! is a control f lir. flop 

PF2 is 1 when document bits are counted, 

FP2 is 0 when pointer bits are counted, 

Kls Counter which counts n bits of the document code 

and also controls the FF2 tc route the clock pulse 
oither to K1 cr K2 . 

K2s Counter similar to Kl; counting the bits of the 
pointer information. 

FP3s Helds the result of the Boolean expression Z above 
Lis Latch supplying the Boolean constant 1 in the 
expression Z. 

L2,L33 Reset and set latches. 
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The main difficulty in this bit serial mode is the need 
for concurrent memory access in storin';;; tho selected pointers. 
Sup»pose documents storeid iiti two different tracks are matched 
and found relevant to the given query at the same time instant. 
Pointers from both these tracks will have to be stored in the 
memory before the selection of the next pointer. Under v^orst 
case the memory access time should be nx/lt, '/here 

n; nuaiber of bits in one code word that 

in 0 . pointer 

t; intorbit transfer time 

k; number of parallel read and match logic 
When n = 64? x = 500 n.s. and k = 64, a 300 n.s. memory would 
need no buffering. The cost of such fast semiconductor memories 
are comparable with convential core memories (4) . 

The typical diagram for bit serial method is shown in 
Figure 6.3. The query is held in the circulating register, SI. 

The counters K1 and K2 of Figure 6,2 can be common to more than 
one track depending on the fan out limitations. The black box 
named M refers to the matching logic of Figure 6,2 but excludes SI, 
K1 and K2, 

The bit serial method has the following advantages: 

1, The code w^ord can be c f any length and is restricted 
only by the capacity cf the storage device. 

.2. Failure in one track will not cause tho total failure. 

3. Different tracks can store different lengths of code 
words and its use is discussed belov;. 



Output Processor 
Ilemorv 












Consider the five attribute. .’'THIC systeiA discussed in 
Section 3. 3= For survey and tutorial type docuraents the 
nuraber of citations will bo nuch :o,ore eciEpfre'" to their docu 
ments. This ■'//ould need Icncer code words anC', usinp the feame 
code iword length for all docments m.w/ net b-j cccnomical^ 

In bit serial mode such cede words can be stored on a separate 
track. 

Bit Parallel tlethod 

In this method all the bits of a co.de v/erd are recorded 
in the cylinder mode eind can be read at the same time. The 
query bits are matched with the document code bits all at the 
same time. If the match is successful the pointer associated 
with the document code word, is selected. Suppose the disk 
has 64 tracks and the document code and the pointer together 
need less than 64 bits. Then it is possible to store one docu- 
ment along a radial line of the disc per bit position. When 
the track has capacity for 4h bits it in possihlo to store 4K 
documents. 

Unlike tho bit serial method there is no buffering 
problem. Concurrent access to tho memory cf the third level 
processor will not occur in this method. Kov>;evcr the dis- 
advantage in this case woul.d be lack of flexibility. To 
store documents with longer code words special logic is to be 
incorporated. One way to do this could be to use double 
length or triple length codewords and use one of the 64 bits 



140 


to indicate such special codes. Such multiple length code 
words would need special attention in the preprocessing stage 
itself. Details cf the logic required for bit parallel mat- 
ching is Shown in Figure 6.4. The ra bit pointer information 
ahd the ii bit document code arc read in parallol into P and D 
registers (Figure 6.4), Then D rc-jister contents are matbJiSd 
with the query stored in the Q register in bit par^iiSi., T''7hen 
the- match is successful the/regietcr contents are transferred 
to the memory. It is evident from the Uro figures that the 
logic of bit parallel method is much simpler. Comparison of 
the components required betv/een bit serial method and bit 
parallel methods is presented in Table 6,1. 

Prom the sample data used for ATNIC system referred to 
earlier, the follov/ing statistics have boon obtained (the 
figures are roxinded upwards) . 

Average number keywords/docuoent ; 10 

Average nuptber of authors /do cment = 2 

Average number of citations/document = 7 
However when indexing is more specific as much as 35 keywords 
per document is not xanccnuacn (25) . The main difficulty with 
this inflexible scheme is v/hen there is need to hai'idle such 
specific indexing. Using longer code word length of any 
length is not a solution. Note that the superimposed ceding 
does not U80 storage efficiently. Further, representing the 
infomation in the form of connected graph or tree as used 
in SMART system is difficult to process with this scheme. 
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6 o 4 Linear Query Processing 

A query fonnulated with keys and Boolean operators is 
often known as Boolean query and the query v/ith weights 
assigned to keys is called "linear query". The later has 
advantages in assigning relevance nurober to docunients using 
the probabilistic indexing and the user assigned v/eights. 

From this point of view the above hardware oriented search is 
modified in this section to account for linear queries. 

Suppose the v/eights assigned to various keys can be 
represented as a 4 bit binary number. Then the matching 
logic of each track in bit serial method V7ill have the following 
additional functions to perform, 

(i) T.Jhen a non zero bit of the query matches with that 
of the document code, the corresponding weight is added to 
the accumulator. 

(ii) When all the bits of the query code are matched 
with the document code, the one 'sc of the threshold 

is added to the sum of the weights. 

(iii) If there is overflow in the above addition, the 
pointer information is transferred to the memory for further 
display of this selected document. 

The logic diagram shoxTO in Figure 6.5 in conjunction with 
Figure 6,2 V 70 uld achieve the above functions. W and t registers 
held the weight of keywords and threshold for selection res- 
pectively. Wj_ indicates the weight associated with the bit 
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being matched, At the beginning of each document code match, 
the accumulator is reset to zeros. Thereafter it keeps on 
accumulating the vreights of the matching keys. At the end of 
document-query code matching,- signsllad by the counter Kl 
(Figure 6,2) the ones complement of x is added to the accurau'' 
lator and checked for overflow. If there is an overflow 
(which indicates the sum of the weights is greater than the 
threshold) the pointer information of this document that is 
buffered into S2 (Figure 6.2) is selected. 

This ir-odification v/ould require an adder and an accumu- 
lator for each track and sorae additional registers. It is 
necessary to complete the addition process before the next 
bit is read. In linear queries if the weights of the keywords 
are constrained to be equal , it is possible to replace the 
adder by a simple one step counter. 

In bit parallel match, as all the weights of the matched 
bits need to be added before the next bit arrives the situation 
is more tight, A pair-wise addition of the selected v^eights 
would reduce the total time required for addition; . Further, 
splitting the process of addition of weights into independent 
processes vrould be helpful when the addition process takes 
more time than the interbit transfer time, When processing 
time is large compared to the transfer rate and splitting 
the process into independent processes is not possible, the 
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orgnization like the one describerl in Chapter IV will be 
usefulo 

The raatching logic of Figure 6.2 is expandec'. in this 
section to the level of a processor caneble of doing integer 
addition. The prograra for this processor is hard V7ired„ It 
is possible to extend tiia pov7er cf this processor rith an 
associated nencry to the L'ivel of a -'raicro cbiaputer'- as 
discussed in Chapter IV. 



CnrPTEP. VII 


CONCLUSION iV'ID SUGGESTIONS FOP FURTHEP. RESEARCH 

In tlriis thesis, a multi -attribute system cf information 
representation and its role in improving the performance of 
a retrieval system have been consideredo i^n algorithm has 
been developed to pick a set of attributes to be used in or- 
ganizing the given data base for retrieval. This algorithm 
when applied to a samiple data in library information system 
gives keywords and the citations as the two important attri- 
butes (23) . It has been shov/n that multi attributes and 
user feedback in retrieval system would statistically improve 
its performance, 

A special purpose computer system, well matched to the 
problem of information retrieval has been proposed. This 
architecture takes advantage of the special nature of the 
information retrieval problems, namely, hierarchy cf operations 
and inherent parallelism. A. system level simulation of the 
parallel processing section of the architecture has been 
carried out in FORTIIAN. The use cf this simulator in selecting 
the number of parallel operating elements like processors and 
channels in order to meet the requirements of an information 
system in terms of the response time has been presented. An 
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example for this design procedure with an iterative use of 
the simulator is worked outo The use of an analytical model 
in selecting the starting Vciluea for the free parameters of 
the simulation model has been discussoc. The starting value 
obtained from the analytical model would reduce the number 
of iterations (in using the simulator) tc arrive at the 
desired ccnf iruraticn. 

By restricting to a particular type of indexing and coding 
it has been shc-wn that it is possible to device a special purpose 
hardware oriented search » In this mathod.,- a set of few gates 
and registers together constitute a second level processor- when 
they are used with a parallel read ddsk. The fnctf that highly 
specialized systems lead tc a cheaper design at the cost of 
flexibility is reinforced by this. By incorporating elementary 
capabilities for addition and comparison intc these processors 
it is made possible tr> process linear queries with such specia- 
lized processors. 

It is interesting as well as encouraging to note that the 
feasibility of designing such a special purpose system by 
observing that a mini computer of OE nemcry (word, length of 
8 ~ 12 bits) for integer and Boolean data nr'^cessing costs 
about five thousand dollars (11) . T\ further decrease in 
hardware cost is possible with the recent dev^slonments in 
LSI (35) . All these support the conclusion that the d.esign 
of a parallel processing computer system with special 
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applications tc inforraation retrieval problep^s would not only 

be technically feasible but also econcnically viable o This 

thesis presents a typical architecture for the same a.ncl the 

raetlioc: of selecting the ccmocnents cf this architecture tc 
the 

meet/^iven requireraents cf on infcrxuation systen. 

POSSIBLE EXTEMSIOITS 

(i) The architecture proposed in this thesis v;ill be use- 
ful in on line interactive retrieval, h query language c.evelop- 
ment with due consideration to file organization and. user inter- 
action is worth investigating, 

(ii) The reliability that can be obtained with the parallel 
processors in the second level and the modular nature of the 
system suggest its application in information networks. 

(iii) A fxanction level simulation of .the P.E.s proposed in 
this system would be useful in improving the tentative selection 
of operation codes mentioned in this thesis, A detailed study 
of the various algorithms used in keyvrord matching systems and 
graph structured information matching will be useful in this 
context, 

(iv) A complete simulation of the architecture with due 
considerations tc the first and third level, time sharing 
systems is interesting. The simulation model presented in this 
thesis vixill be part of such a total system simulaticn. 
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(v) Various constants used in the analytical model vrould 
depend upon the input parameters of the simulator like the 
arrival rate and average pr-cessing time per customer, obtaining 
the required, constants for the analytical model as functions of 
the input parameters and their study is interesting. But, this 
would involve considerable amount cf cernputer time, Hov/ever, 
such a study is v^'orthwhile as this kind of computer architecture 
becomes popular. 

(vi) Design and fabrication of the special purpose hardvi’'are 
oriented search discussed in this thesis is worth considering 
and will pose no new problems (113) . 

(vii) Design and testing of the complete architecture seems 
to be an interesting project particularly v/hen it may be based 
on time sharing mini computers available in the market. 
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APPENDIX I 


Distance Measure 

The similarity measure defined in Section 3.3, namely, 

eiu (d^, d2) = Tdii-<^2i I ^ 

= 0 other'.'7ise 

is non negative and symmetric. But,- the triangle inequality 
need not be always satisfied. For example, let 

dl = <5> 

6 ^ = < 10 > 

6.2 ~ < 6 > 

Then 

1 1^1. 
jTe^f llO-e'C - ^a^se 

Consider the following distance measurer 

Distance between A and B; d(A,B) = |a •* B | 

XT 37 

where Aj. and B^ are the relevance numbers (real numbers) assigned 
to the documents A and B with respect to the query under consi- 
deration. In its simplest form this could be the '-grade of 
membership'' of the documents as discussed in Section 3.5. 

This measure defines a distance between two documents 
in the light of the given query. Considering the set of docu- 
ments as dom.ain, the "membership function" (See Section 3.5) 
maps them into the real line interval [0,1], The above dis- 
tance measure is defined over the range ratherthan the domain. 

It is to be noted that the mapping involved is not "one-to-one". 
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There can be more than one document giving rise to the same 
grade of membership for the given query- However, as far 
as that particular query is concerned such documents are 
equivalent or equivally important- 

A number of other similarity measures satisfying the 
metric properties have been disdussed by Salton (96) . Till 
such measures define the distance between two documents 
independent of the query. 


i^PPENDIX II 


n - level indirect relatedness 

An n-level indirect relatedness becomes vreaker as n 
increases s 

Consider the three documents 
By theorem 3 

d(A,C) d(A,B) + d(3,C) 
where d(<.) is the "distance*' 
measure, A*dd a fourth document 
D as shown in the figure. Two level indirect relatedness 
between A, and C i-- bt twined from- d 2 {/'..rC) , th ■ distance 
b t\7oen i" and C thr' uoh D and 3, Then 

d2(A,C) < d(A.,D) + d(D,B) + d.{B,C) 
as d(A,B) £ d(A,D) + d(D,B) 

d„ (A.C) > d(A,C) , 

The equality holds when all the documents are similar. Otherwise 
the distance between A and C is more when the reference is 
through B and D than the distance through 3 alone. As distance 
increases the relatedness would decrease. 

For n level indirect-relatadness the same argument can be 
extended by including the remaining levels. 
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APPENDIX III 


PKS and ISP notations 

PMS Notation 

This notation has been developed to describe the computer 
structure at the information flov; level. The seven basic conpo- 
nents of this system, are described belov;: 

1. Memory (K) ; A component used to store information 
units (i'-units) . The i- units are not changed in 
any vray when they are stored in memory. 

2. Link (L) i A component that transfers i-units from 
one place to another in a computer system. Again 
there is no change in the i'-units v/ith this compo- 
nent. 

3. Control (K) 2 A component vrhich evokes the operations 
of other components. Except the processor, P, all 
other components need some control to set them for 
work. 

4. Switch (S) - The switch is associated ■'■■'ith a set of 
links, and can set some of them and break the rest. 

5. Transducer (T) s The transducer changes the i-units 
to encode a given meaning. For example the I/O 
device is a transducer. 

6 . Data Operation (D) s This component produces i-units 
with new meaning, e.g., arithmetic, logic and 
shifting operations. 
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7« Processor (P ) t A component capable of inter- 
preting a program and executing it. It consists 
of the components already mentioned, 
i^.s an example the classical diagram for Von Neumann 
computers is shown below in PHS notation 



Information path Xs external environment 

Control signal path 
ISP Notation 

The ISP description provides a scheme for specifying 
any set of operations in a processor and any rules of inter- 
pretation. These two combindly describe the behaviour of a 
processor. The set of all operations in a processor are 
classified into two parts . One part contains those necessary 
to operate the D component in the PMS level and the other to 
operate links, switches, memories transducers etc. This is 
the level in which the programmer V70uld like the processor 
to be described. All data operations are characterized as 
working on various data types . For example , signed or 
unsigned integers , character strings cind real numbers are 
some data types. 


The instruction set is described by the instruction 
expressions which is of the fornix 
Condition ->■ ' Action -sequence 

When tiie condition is satisfied the action sequence is evoked. 
Each action on a sequence has the forms 

Memory expression -e data expression 
The left arrow indicates the transmit operations. The left- 
side specifies the memory location and the right side describes 
the information pattern to be placed. Each action in the 
sequence may itself be conditional of the form.; condition ■> 
action-sequence. Secondly, any sequential order or actions 
is indicated by the word "next . For instance 

(A B, B A) would swap their contents 

{A ■*- Bs next B A) leaves A and B v/ith the 

contents of B. 

In ISP notation the processor state is described by providing 
names for the various bits in the registers 

A <0sl7> Accumulator. 

This denotes an accumulator v;ith 13 bits addressed as 0,1 17. 

Similarly, other indicators and special registers in the C.P.U. 
are described. The memory state is described by assigning 
name to each addressable unit (vrord or character) and to 
each bit in this i-unit. 

M [Os 37770] <0sl7> denotes a memory with 2K 
words and each word has 18 bits. The letter H stands for 
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memory and the suffix distinguishes from other memory units., 

A number of notations are used in describing the action 
sequence like? [ [ for concatenation, — i "or ’.‘OT,. A for 

and so on, A summary of the notations and. examples of many 
computer system described in these notations are available in 
the book cited earlier. 
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SIMULATOR listing 

SIBFTC MAIM 

logical first 

COMMOf-a /XX/ ERP sSCLCiCsTCUST ^bALKvCosTP sTOTF 
COMMON /X.1 / FIRST? MR Jr'S ? I FI G 
COMMOm /X13/ XMlLa?XM2D?X-'SS.,.Xr ALsXMFD 
C OM M 0 / X 2 C' / N R 9 I P M C H , I P R M T I >.) P T 

COMMON /X15/ CLMA/ '.‘CLFIM 9 , ( 50 ? lu ) ■; N I F oM CH ?NOMAX > AM I F ? NGM 
common /X16/ OTCHCl( 15) sOTLilMK 1=: ) "OTCHLD{ 15) .NOCH, 

1 SMAXsSi-'ilH 

common /X5/ XflAs-X'iPsXMRvXi^s .'CM ,PPuS !.M9TCU.rTp 
COMMON /X25/ iSP 
COMMON /X53/ I PC 
dimension PS (A. 10) 

IPC=:1 

c 

IF IPC IS ZERO YOU GET COMP STAT PRINTED IN TABULAR FORM 
C 

rewind 2 

IPRMT=1 

FIRST=:oTRUE. 

ISP = 1 
REVlINi.) 1 

R EAD 20 0 9 NST 9MB, MF , MS , NB 9 NF 9 N S 

DO 20 I =MB9MF9MS 

NOCH=I 

PAUSE 1111 

DO 20 J=NB9NF9NS 

NCH = J 

NR = 0 

14 NR=NP+1 

M = NST^f(NP-l )+l 
I FIN oLEo NCH) NQM=M. 

IFIMCh ^LEo M) NQM=NCH 

200 FoRMAT(7I3) 

IFdPRNT ,;EG.. 1) PRINT 10,1'!!' , 

10 FORMAT ( IHl 9 40X 9*RUfl NUMEEp =-'5 - 9 I 4/4UX 9 1 3 ( 1H* ) / / ) 

PRINT 2'0l 9'NP ,N-9NCH 9NOCH 9,NQM 

201 iFORMATI 5X9813) ' 

CALL DATAIf.' 

ISP = 0 

IFIERR.LToO.O) STOP 
CALL SIMUL 
FlRST=aFALSE« 

IF ( lOPTv oEQ. 1) PRINT13,NR 
13 FORMAT (20X9*RUN *9l3 9->^ COMPLETED^d) 

29 IFINR oGEo NRUNS) GO TO 15 

GO TO 14 
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15 IFdPNCH oEQo 1) PUNCH 16 »XM ID , XM2D sXMSC , XM4D , XM5D 

16 F0RMAT(5012) 

IFdOpT oEQc 1) call PARC'DT 

20 continue 

REWIND 2 

IFdPC oEQc 0) STOP 
Do 41 I=MB,MF 5 M 5 

DO 41 J = NB?NF'»NS 
DO 42 IK=l94 

READ (2943) ( P S (iK , JO > JK= 1 » MRUNS ) 

42 PUNCH 44? ( PSd K> JK ) 9 JK=1 sNRUNS) j I 9 J 

41 continue 

43 F 0 RMAT( 8E10o3) 

44 Format ( 4Ei0o3 9 2x 92 1 3 ) 

CALL EXIT 

END 

SIBFTC DATAIN 

subroutine DATAIN 
DIMENSION name ( 5 ) 9 JTP ( 8 ) 

logical first 

common /XX/ ERR9SCLCK9TCUST9BALK9CUSTP9T0TF 
COMMON /XI/ FI RSTsNRUNS, IFLG 
common /X2/ ITYPEdFIN9TASK 
COMMON /X3/ TARV 9 TBET 

common /X5/ XMA9XMP9XMR9XMS,XMT9PPER9N9TCUSTP 
common /X7/ SUMA(2095) 

COMMON /X 8 / SSUMA(2095) 

common /X13/ XMID 9 XM 2 D 9 XM 3 D 9 XM 4 D 9 XM 5 D 

COMMON /Xll/ PARAMd5 99 ) 9 FREQ(159lO) 

COMMON /X14/ CHClCK(50) ,CHIDLE(50) 9CHLOAD(50) »QL(50) 9 
1 QB(50) 

COMMON /X15/ QLMAX 9 QLMIN 9 0M( 50, 10 ) 9 N I F 9 NCH 9 NQMAX 9 AN I F , NQM 
common /X16/ 0TCHCl(15) 90TCIIWT( 15) 90TCHLD( 15) >N0CH9 
1 SMAX 95 MIN 

COMMON /X17/ MFJC(50) 9NPF(50) 9 NFJ 9 XMF 
common /X20/ NR9IPNCH9IPRN1 9 IOPT 
COMMON /X21/ I T PR 9 PR J 9 PR T 9 PR J P 9 PR TP 

COMMON /X51/ PBUSd30) ,WORK( 130) ,TJ0B(13J) 9WIPSd30) 9 
1 WOPSdSO) 

COMMON /X25/ ISP 
IF( ISp oEQo 0) GO TO 140 
IF(oNOTc FIRST) GO TO 14 
C 

c****** read type 0 CARDS FOR IDENTIFICATION 
C 

READ 11 9 NRUNS 9 NPROB 9 ID 9 MON 9 IYR 9 NAME 

11 FORMAK l294dX9l2) 95 A 6 ) 

print 12 

12 Format ( / 2 X 9 120 ( iH^) / ) 

print 13,NAME, ID,M0N9 IYR 9 NPROB 


non 
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13 Format ( 15X9546 53 { 13 5 . IH/ ) -sIS) 

PRINT 12 

14 READ 1 65 ( JTP { I ) 9 1 = 1 9 ) 9 IFl C9 IPNCH 9 IPRMT 9 lOPT 

all variables ARE I I'i I T I aL I ZED 
.'Mf I flap is set to one for PE INITIALIZATION 

16 Format ( 401 2D 

IF(IFlG oNEo 1) G(J TO 15 
14u COMTImLE 
5CLCK=P . 

PRJP = 0, 

PPTP = n,. 

6ALK = p.. 

ERR = 0 . 

TCUST = ''. 

TOTF = ri-, 

CUSTP = 0..- 
TCUSTp=0o 
QLMAX = r. o 
QLMIN=loOE 38 
N0MAX=10 
DO 50 1=1,15 
DO 49 J=l99 

49 PARAM( I 5J)=0c. 

Do 50 J=l9l0 

50 FrE0( 1 9 J)=0.-, 

Do 60 1=1,20 

DO 60 0=1,5 
SUMA( I 5 J)=0. 

60 SSUriA( I ,0 )=0o 
DO 73 1=1,20 
SUMA ( I ,4) =1 .nE 38 
SSUMA( I ,4) =lo0E 36 

73 COMTIMUE 

DO 61 1=1,50 
CHL0A0( I ) =0 
CHCLCkI I )= 0o 
CflIDLC{ I )=0., 

QC ( I ) =0 . 

QL( I )=0c. 

DO 61 0=1,10 
QM( I 5j)=-1.0 

61 continue 

Do 68 1=1,15 
0TCHCL( I )=0 o 
0TCHWT( I )=0c 
OTCHLDC I )=0o 
68 continue 

DO 62 1=1,130 
PBUS( I )=0c. 
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62 


WCPIU I )=0„ 
WIPS{ J ) =60 

WCPS ( T ) =0.- 
TJOB ( T ) =1)., 
CoNTImUE 


C 

C 

15 

44 

37 

C 

C 


SP -NI 


0) Tr 

•'•.NDYI 

(X 

'■■•ID) 

pr.iDY? 

(X 

M2D) 

3NDY3 

( A 


'■.T,jDY4 

( X 


^NDY5 

^ I 
! 

(X 

: . 5 D ) 

TV PE 

T 

DATA 


IF( JTP( 1) nCO.O) GO TO 20 
RE:AD 44 9 Mi, I 'Of" 

Format ( 214 ) 

format ( / 20 X '.-n^nOu of PROClSPORS =* 
’ print 37 pN 

pEAO TYPE 11 DATA 


18 /) 


20 IF( JTP(2) .EQ„0) GO TO 25 

READ 36 sXMA s XMP s XPiR ? XMS ^XMT •> PPEP 
PRINT 18 9XMA,,X1'P aXMP s, XiM S v X,"’T 
18 FORMAT( 1 o'X 5-;,-,N£AM IMTEl.’ ARRIVAL TIME =-x- , F 1 0 4 » 5 X , 

1 -if-MEAN PROCESS TIME =-'c , F 10 4/ 12X •j-X'MEAl'! PRIORITY RATE =-:^s 

2 FIOpAjSXj-x-MLAN storage ACCoTIKt FIO .>4/ ) 

XMA=loO/XNA 

C 


C-sfx- 

C 

25 

pFAD type III CARD 



IF(jTp(3) EOo U) GO TO 26 

READ ?1 ITYPEi, TFIN kTASK 




21 

format ( I2 93Fr., 3) 

I F ( I type a EO -a ) P R I NT 22 , TA .K 

IF( ITYPt..Ena’) PRINT 23,TFIM 



22 

F 0 R M A T ( 2 X a- S I M U L A T 1 1 • N C 0 N t - C l . 

PAPAi 

JOBS =x-,F12o4/) 

23 

r 

FOPMAT(20X;,;;-C.INLLATI ;;N CONT :ol 

P-,'''A-'') 

iCTlR TIME =^aF12o4/) 

C*-!f 

r 

read type IV CARS 



'w 

26 

IF(JTP(4) .EQo 1) GOTO 30 
ITPR=n 

GO TO 27 



30 

READ 21 9 ITPRjPRT ,PRJ 




C 

C*^f”-*** read type V CARDS 
C 


27 


IF(JTP(5) oEQo 0) GO TO 40 



n n n n n n 
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READ ?3 5 XMF 3 N i I" •) NEH 
38 FORMAT ( F8 -.3 s2I H ) 

ANIF^WIF 

print ASsMIFjNCH^XMF 

A3 format ( /20X 9-x-MOo OF FILES IN SY-STEfi =-5i- ? I 7 ? 5X s oOF* » 

1 channels =-x-3 it 3/23X 3*HEA!' NO., OF FIlFS REQUESTED FOR* s 

2 ^search =-::-3Fi0o3// ) 

XMF=1oO/XFF 


TYPE Vl CARDS 


SElO 


FOR 


'.NL'OM GEM 


ATQRS 


40 IF(JTP{6) uEOc 0) GO TO A1 

46 FORMAT (601?) 

IFCIFlC. .-NEo 1) >',Q TO 41 

I F ( FIrlT ) ptAD 46 jXiMIDo X^-.2D 9 XH? D » XM4D s XM5D 
CALL SNOYl(XMlD) 

CALL SNDY2(XM2D) 

CALL SNDY3 (Xi''i3D) 

CALL SNDYACXiMAD) 

CALL 5NDY5(XN5D) 


type VII DATA OUTPUT CHANNEL INFORMATION 

41 IF{JTP(7) ,E0o 0) GO TO 42 
R EAD 65 9 NOCH , XMN 9 XMX 

65 Format ( 129 2F3o 3 ) 

print 67 9 NOCH 

67 FORMAT ( /20X 9 'HNOo OF OUTPUT CHANNELS =^( 915 /) 
print 99 9XMM 9XMX 

99 FORMAT! lOXs^OUTPUT TIME =-';-F 1 0 „ 3 9 2 X s *T0* 9 F 1 0 u 3 9 *P ERCENT*/ ) 
SMAX=XMX*XMP 
SMIN-=XM'i*XiMP 
C 

C****** TYP[ VIII data 
C 

42 IF{JTP(8) oEQo U) GC to 9') 

PRINT 32 

32 FGPMAT(/30X9-:'-INTERVAL L I' 'i i T S |- OP H J SToSP AM S-;:-//aX 9 *CODE* 9 

1 20X9*RIGHT open interval Lii'UTG*/) 

READ 16 9 NH I ST 

IF(NHIST oGTc IS) MHIST=15 

DO 35 1=19NHIST 

READ 369 (PARAM ( I 9 J ) 9 J=l99) 

36 FORMAT! IOF8C3) 

print 33 sI 9 (PARAM ( I 9 J ) 9 J = 1 oR ) 

3 3 FORMAT! 10X9 13 9 9 ! 2X9 F9o 3) ) 

35 continue 

99 RETURN 

END 


ilbFTC SIMUL 

subroutine 5IMUL 

C 0 ' ■ I F I N /XX/ E R R » S C L C K T C U S T 5 E A L K 9 C U S T P » T 0 T F 
COMMON /XI/ FlRSTsf'RUNS 5 IFLG 
/X 2 / iTYPE.TFIRs T ;oK. 

COf'lMCRl /X3/ TAPVSTGET 

COMMCO' /Xl'i/ CHCl+K(50) , CH I OlE ( 5 0 ) ',.01-10030(30) jOL(3n) , 

1 QB ( 3-: ■' 

CGMMOM /X^i/ ITPP.»P’"J?PPT..PPJP9:"^TP 

Common /X 5 i/ pbu 5 (i?.c) sOo.riU 130 ) -i-TjODd^ 3 ) 9 Wips(] 30 ) . 

1 WOPS ( 130 !) 

10 CALL .'^FRVL (TIKT 9 JPR.., 2) 

CALL GENFIL 

tbet=tint 

sclck=sclck+tint 

TAFV=pCLCK 

IF( ITpP.EOoU ) GO TO 15 
IFCITpF oEC". 1) GO TO 18 
C 

CORE 1 IS PRINTING BY JOP CONTROL 
C 

17 FORMAT ( /15X 9*SAN,PLE OUTpUT AFTER ^^.Elb^Od- UNITS OF^d 
1 TIMFSt /) 

IF ( (SCLCK-PRTP ) -LToPRT) GO To 15 
PRTP=SCLCK 
PRINT ITsSCLCK 
20 CALL OUTPUT 
GO TO 15 

18 IF ( ( TCUST-PPJP ) oLT. PRJ) GO TO 15 
PRJP=TCUST 

PRINT 19-.TCUST 

19 FORMAT ( /15X ?-;5-SAMPLE OUTPUT AFTER idF15o3s-:c- ARRIVALS^F/) 
GO TO 20 

15 IF( Jpp^ E0'>1 ) CUSTP = CUSTp + l , 

CALL PPOCES 

IF(£RP cLTo Od return 

IF( ITYPEoEQd.) GO TO 12 

T M A X = Y ('^ A X ( S C L C K 9 C i - i C L C I d P b L o ) 

C 

***.?{■•> simulation ends dY TIF'E CONTROL 
C 

IF(TMAXdTdFIN) GO TO 10 

11 IF(SClCKcGEoTFIN) go to 13 
CALL APRVL (TINT,JPR,1) 

sclck=sclck-ftint 

BALK = 0ALK-F1 o 
GO TO 11 
C 

c****** simulation ends by no. of JOl.S processed 

c 



n n 
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12 IFdCsjST oLTo TASK) GO TO 10 
C 

UPDATE pEO'JIpED V/-,RIAdLES FOR THE LAST TIME SEGMENT 

13 Continue 
CALL OUTPUT 
RETURN 

End 

SIBFTC PpOCEc 

SU&ROUTIMC PROCLG 

COMMO^l /X;</ ERRdCLCF dCUSTvUALK 5 CUsTp <, T jTF 
COMMON /K3/ TARV.TL-UiT 

COMMOm /X5 / XHA ? XI-’P 5 XMK 9 X' . 9 X.'.T 9 F P L'"^ 9 N 9 T OUST P 

CCl'S*IOM 7X15/ QLMAX 90L:;IN9r': (5^'9lO) 9 N I F 9 NCH 9 NOMAX 9 AM I F 9 MOM 

CQ^'i^';0^) 7X17/ MFJC ( 50 ) 9NPF ( 50 ) 9 NFJ 9 XMF 

CO'-'MON 7X517 P6Uo(130) sl'ORKdEO) 9TJOL;(13G) 9WIPS(130) 9 
1 WOPS (TOO) 

CALL SCHLO 
DO 13 1=1 9 NCH 
JFR=NPF ( I ) 

IF(jFp oGTo 0) GO TO 11 
CALL ERROR U 6 O 9 O) 

GO TO 13 

11 XIDlL=Oc 
VMIT=pBUG( JFR)-SCLCK 

C 

C#4Hi-s^-5!^c Tlr'lE SPENT IN GCHEDULd'G (OVER HEAD) IS MOT ACCOUNTED 
C 

IF(WAIT uGE. Oo) GO TO 12 

xidle=-wait 

WAIT=n= 

12 CALL COLCT (WAIT,!) 

CALL HISTO (iJAITsl) 

C 

CODE 1 IS WAITING TIME OF JOO.S FOR f-’ROCESSORS 
C 

PL.US( jFR)=pbUS ( JFP )-!-Xn'LE 

13 continue 
CALL DPUMAC 

if(Erp olTo Oc) return 

CALL OUTUUF 
C 

TO FIND T(-;E time OF COMPL'-T i ON OF TUr..'- JOU 
C 

J = 1 

JFR=MPF ( J ) 

TC0MP=PBUS( JFR) 

DO 15 d-lsNCH 
JFR=MPF ( I ) 

IF(JFR cGTo 0) GO TO 18 


■ I II I 


I 


I 


i 
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18 IFdCo^'iP .GE, PbUS(JFP)) GO TO 15 

TCOMP=PGUS ( JFR ) 

15 CO.OTIMUE 

tpan=tcomp-^tar\/ 

CALL rOLCT (TP an, 2) 

CALL HIGTO (ToANd) 

return 

EmO 

SIBFTC GEMFIL 

SUt5R(;uTINE GLNFIL 

CORiMOL' /XX/ ERRsLCLC|-,.dC'JGT> XL L •> CUGT p , TOT F 
COMi'lC'N /Xl3/ Xi'llL'sXHTUDXM'iLsXi-lAU?/!' TO 

COiRfiOM /X15/ QLHAXdiLf-il''' oOi'U jOs 10 ) W I F a NCH sNQFlAX !> AN I F p NOM 
C G M 0 N / X ]. 7 / N F J C { 5 0 ) p N p F { 5 0 ) ■, i r J p X ' 'i F 
TIM = 0.:' 

ANCH-NCH 
Do 10 I=]pNCH 
10 Mr-jc(i)=o 

CALL DRAND (SEEDpRMUMd) 
xmad=seed 
C 

NO. CF FILES EXPONL'NTI ALLY DlSTRlLUTED VITH MEAN XMF 

C 

IF(RNUIM ^GT. ('„) TlM = -ldXMF-!;-ALOG(PNUM) 
mfj=ttm+(.l.5 
C 

aTlEaST one file is needed fop a job 

C 

IF(NFj uEOc. u) mfj = i 
IF (M hJcGT.NlF) NFJ=MIF 
ANFJ=NFJ 
Totf = TOTF+;\NF J 
1=0 

12 1 = 1+1 

CALL DP AMD (DLEu^RNU:., J) 

JC = PNUI"0<-ANCH+u .5 
IF(JC .EJ 0) JC=1 
IF(JC oGT. NCH) jC=NCn 
MFJC( jC)=r!FJC{ JC)+1 
C 

MFJ FILES APE GENERATED WITH UNIFORM DISTRIBUTION 
WHERE NCH IS THE NOv OF PHYSICAL FILL IN T + E SYSTEM 
C 

IF( I oLTo nFJ) go TO 12 

return 

END 
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SIBFTC SCHED 

subroutine SCHED 
dimension iNDPdsC) jREEWISO) 

COD;MON! /XK/ ERR.5CLC(.dTCUST5EAL!:..CUsTp,T ITF 
COO. CON /X5/ X-'i.A sXOiP sX; iP ^XMOsXMT 5 P PE 'd ,\i , T CUSTp 
CO.’ihOT'' /Xl5/ ■'Li'-iAX '•OL' ilN "O' ( 5 U 9 10 ) ; ■•'•I F * ''CH 9 NQi-i.lX 9 A .N I F sNQi'l 
CoOMCm /X]?/ :iFjC(EO) 9r.'-'F(d ) ji'.FJ?)'' F 

COf-li'C'O' 7X51/ PdU... d" 0 ) , liOf-K { ISO ) ;TJ03 (txO) ->DTP0{1X0) j 
1 V/OPG dOO ) 

DO 12 I=l9N 

12 RE?-'( I )=PBU1( I )-SCLC^• 

C 

C-i5--);-**-K--"- TO ApPAl'CE T'-IE PFC'CES.SCm- - 0 1 fnE I ■•h:f El C I NS ORDER OF 
C-ic- THEIp LEFT OVER -AjOY 'i. I;' SEpVIC'IOG EApl lEP LALIS 

C 

DO lA I=1,N 
KJ = 1 

5ML=REi'''i (KJ ) 

DO 13 J=l9M 

I F ( REm, ( J ) OE o SML ) GO TO 1 J 
SML=PEM ( J ) 

KJ = J 

13 continue 
lNDPd)=KJ 
REM( K J ) =loOE38 

14 Continue 

c 

nOM PROCESSORS ARE ALLOTcp PFR JOB 
C 

IC = 1 

15 IXP=1 

17 NPF ( IC) = INDP ( IXP ) 

IC=IC+1 

IXP=IXP+1 

IFdC oGT. MCH) GO TO IS 
IF(IXP cGT.., NQM) GO TO 15 
GO TO 17 

18 return 

END 

SIBFTC DpUMAC 

subroutine drui-.ac 

COMMON /..<X/ ERRpSCLCr 9 TCUS I 9 BALK 9 CLLd P 9 TOTE 
Common /x 5/ xr , a sXmp » xmr ,xms 5 xj-it sRPErdi’". sT custp 
COMMON 7X14/ CHClCK ( 50 ) 9CHTDLE( .50 ) 9 CHLOA D( 50) sQL ( 50) 9 
1 QB(50) 

common 7X157 OLMAX sQLHlN sGR ( 50 , 10 ) 9 N I F d CH 9 1'dMAX 9 A N I F , NQM 
common 7X177 MFJC ( 50 ) 9 NPF ( 50 ) jNF J 9 XMF 

common 7X517 PBUSdSO) 9W0RK(130) ,TJ0DC! ’.Q) sWIPSdOO) , 

1 WOPS (130) 


■I urn •HI 1 1 HI in I 


on on 
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Do 11 I=1,^KH 
DO 11 J=1?M0MAX 

iFior-Kisj) .lTo Ho) go to 11 

IF(QM(JsJ) ..GT-. SClCK) oO T>.' 1] 

IF(QL(1) --<GT. Or.) OL ( I ) =CL( I )-l 
0H( I 5 J ) =-l 

11 COMTir-iUE 

DRU:;AC EXPECTS P::.!JS TO Hr.LL’ TH'f TI. c /-,T '''HIGH REQUEST 
for file is iIADF 

Do lo I=1,NCH 

IFCr-'.FjC { I ) .£o„ U) GO TO ]6 
AmF = MFJC ( I ) 

C 

all LOGICAL FILES I CME PilYSIC, L FILE IS ALLOTLD TO 
oNE E AMD AI-iF IS LO OF LOGICAL FILES PER PHY.'jICAL FILE 
C 

R IDlL=0o 

CALL OVHEAD (Til) 

T I IIE = Xi iS+T M 
DO 12 J=l9N-MAX 
IF(QM(I»J) c.GEu n..) GO TO 12 
JF = J 

GO TO 13 

12 continue 

QB( I )=qB( I )+i„ 

JF = 1 

13 IPR=NPF(I) 

WTM = ChCLCK( I )-PBUS( I PR) 

IF(WTM cGT„ Oo) go to 14 

p idle=-wtli 

CHIDLE( I ) =CHIL'LE ( I ) +RlDLr: 

UTM=0. 

GO TO 15 

14 OL( I-)=OL( n+lr, 

C 

VMITIMG TIME DoLS MOT INCLliUL SERVICE TIME 
C 

15 QM(I ^JF )=CHClCK( I ) 

C 

THE selected FILE RFEPS O'V- FILLINC, MA AND iiB ALTERNATE 
UNTIL all the AMF FILES ARE PROCESSED 
C 

ZM = TImE+XVP-’?'AMF 

PBUS( IPR)=PBUS ( I PR )+WTfvi+ZM 

CHClCk { I )=CHClCK { I )+ridle+zm 

WORK( IPR)=W0RK ( IPR )+AMF*XMP 
WIPS(IPR)=WIP5{ IPR}+WTM 

c****** collects the no of physical files 


■ I I ■ III 


II I ■ ■ I 


III 
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TJOB( IPR )=TJ06 (IPR)+1oO 
CHL0AD( I ) =CHL 0 AD ( I ) h-AMF 

16 continue 

DO 17 I=1,NCH 

I F ( OLUAX u LT „ QL ( I ) ) '^■L ';AX-OL ( I ) 

IF(OLMIN .GTo OL(I)) QLMIN=OL(I) 

17 CQiMTlNUE 
qlen=olmax 

CALL TMST (0LEN,5) 

CALL HISTO (GLEr,i.;i;) 

return 

End 

$I3FTC OUTBUF 

SUBROUTINE OUTBUF 

COMMON 7X5/ XI;a »XHP .XMf- X"-’S ^XLT -.PPL- R ,N ^TCiJSTP 

C 0 N! lO 0 M 7X13/ X M 1 D <1 X M 2 D . >: M L D , X I- 1 A D 5 XUS D 

COf'lMCM 7X167 OTCHCl ( 15 ) ^OTvi IWT ( 1 < ) oOTCHL J ( 1" ) jNOCH, 

1 SMAX?'^MIN 

COMMON 7X157 0 LMAX 9 OLi 1 1 M ? ( 5U , 1 0 ) 5 N I F j NCH » NOMAX 9 AM I F , MOM 

COMMON 7X177 i'lFjC ( 50 ) ^MPF ( SO ) sNFj 9XMF 

COMMON 7X517 P BUS ( 1 3U ) 9 VJORK ( 1 30 ) sTJOBdiU) !.WIP.S(130) 5 

1 WOPS (130) 

DIMENSION ALOC(IOO) 

5X = SMAX-SiMI N 
MAL = 1 

XMIN=0TCHCL(1) 

DO 12 I = li.NOCH 

IFCXMIN cLEo OTCHCl(I)) GQ TO 12 
NAL=I 

12 continue 
C 

ARRANGE PROCESSORS FOR Sf'RVICE 
C 

DO 13 I=l9NCH 
NR=MPF( I ) 

IF (NR oEo. 0) GO TC 14 
ALOC( I )=PBUS(NP) 

GO TO 13 

14 ALOC(l)=-loO 

13 Continue 

c 

C4HH(-*4-r SERVICING the PPGCESSOR REQUESTS FOR OUTPUT 
C 

Do 20 K=l9NCH 
XMIN = l.-OE38 
NPR = 1 

DO 15 1=1, NCH 
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IF{ALOC(I) oLt„ OoO) GO TO 15 
IF(XMTM cLT.. ALOC(I)) GO TO 15 
MPR=I 

Xf'llN = ALOC { I ) 

15 CoWTUilJE 

ALOC(h!PR)=-l. u 
• XlDLi: = ‘'c 

WAI T=0TCHCL ( IIaL ) -PbUS ( I'iPP ) 

IF(WmTT oGF„ Ou) (.0 TO 17 

XlDL.:=-ViAIT 

WAIT = Ck, 

17 WOPS ( MPR ) ( MPR ) +WAI T 

OTCHWT ( NAL ) =OTCHWT ( MAL ) +X JOLt 
CALL DPAMI, ( SEED!)P,MUf! ,3 ) 

T IM= ( 5M I N+£X^'-RMU!'‘. ) ^J-FLOA T ( f'iF JC ( i; ) ) 
PBUS(MPn)=PL.U.S (NPP )+WAIT+Tl!'i 
OTCHCI. ( NAL ) =0TCHCL ( fSAL ) +X lULL + T I M 
OTCHL D ( NAL ) =OTCHLl' { NAL ) +T ' 

20 continue 

RETURN 

end 

SIBFTC ARPVL 

subroutine ARRVL (TINT? JPR^K) 

CONMON /XX/ ERRi»SCLCK.9TCUST5L;ALK^CUSTp!,TOTF 
COMMON /X5 / XMA s XMP » XMR » XM^^ s XMT 5 PPEP ? N s.TCUSTp 
COMMON /X13/ XM1 DjXM2DjXM3D!.XMAD»Xi' 15D 
JPR = 0 
TINT = r., 

CALL DPAMD ( SEED sRNUHs 1 ) 
xmid=seed 

I F ( RNUM uGT ... 0 u ) T I NT = “1 , /Xf'.;A-N‘'AL0G ( RNUi."' ) 
TCUST=TCUST+1. 

IF(KoEOol) return 
MPSSN=0 

Y=EXP (“XMP) 

X = l.-u 

3 CALL DPAwD ( SE ED sPMU! 1 ? 3 ) 

XM3D = ,5£ED 
I F ( X-Y ) 6,1 -I 
7 X = X^fRMUM 

NPSSN=NPSSN+1 
GO TO 3 

6 PER=FlOAT(NPSSN) /(TCUSI-TCLlTP) 

IF(PEr oLEc PPER) return 

TCUSTp=TCUST 

JPP-1 

c 

SEE GASP BOOK PAGE 100 FOR THESE ALGORITHMS 

c 

return, 

end 
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SIEFTC DR a NO 

subroutine DrAND (SEED?RNUM9 IG) 
common /XI?/ XH1D!)XM2D.XH:^D9XM4D3XM5D 

X=lr,0 

c 

X IS DUMMY ARGU’vSmT WHICH IS IGNOrfD GY RANDOM GENS 
C 

IF (16 oGT.. 5) RETURN 
Go TO 

1 RN.UM = pf!DYl ( X ) 

SEED=WNDY1(X) 

PETUP'-a 

2 RNUM = Rf->DY2 ( >■ ) 

SEED='v.(('JDY2{X) 

return 

3 RNUM = PMDY3 ( X ) 

SEED = vJMDY3(X) 

4 PNUM = P'''DY4 ( X ) 

SEED = WNDYA{X) 

RETURN 

5 PMUM=PNDY5 ( X ) 

SEED=WNDY5 (X ) 
return 

End 

SIBFTC YMAX 

FUNCTION YMAX ( 5CLCi< , CHCLCF , PBUS ) 

DIMENSION CHCLCK(50) 120) 

ymax=sclck 

DO 2 I=l?5n 

IF (YMAX oGEo CHCLCK(I)) GO TO 2 
YMAX = CHClCK( I ) 

2 Continue 

DO 3 1=1 5 130 

IF(YM«.X -GE.. PUUS(I)) Go TO 3 
YMAX = Pi5US ( I ) 

3 continue 

return 

END 

siBFTc Error 

subroutine E.PPOR ( NCrP E 9 KFLo ) 

common /XX/ FRRs,SClC'/ .TCUST^ bMLK »CU:3TP 9 TOTF 

print 29 NCODE 

2 FORMAT ( /30X j^Ef'ROR TYPE ^-r.-i/) 

IF (KFLG^EQoJ ) RETURN 

C 

irrecoverable error TUph o,P' TF.RMI;; -,TI0K FLAG 

c 

E RR = " 1 o 
PRINT 3 

3 FORMAT! /25X 9^tDUMP OF SYSTEM STATUS -!'.'/23X , 28 ( 1 H-) // ) 
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CALL OUTPUT 

RETURf'i 

END 

SIBFTC OVHEz-vD 

SULROUTINE OVHEAO (Tl) 

CO.MHOM /X5/ XNA ^Xi'iP iiXfiR s>XM£ .XMT sPPER jN ^TCUSTp 
COMMOfi /X13/ X,VlD>.XU2D!.X'i3L5XM4L''. KPSO 

COMMOf! /XI 5/ OLMAX .OLMIN ( 50 , 10 ) > N I F !, .N CH !. ^'QMAX s AN t F >MQM 

C- 

ROTARY FILE CONCEPT lo ASSUMED 
C 

T['i=«0o 

IF(MIF cLE^ NCH) RETURN 
CALL DP AND ( SEED,RNUM»2 ). 

XM2D=5EEb 
TM = RMUN*XLE-\'0 o25 
C 

25 PERCENT CHOSEN FROM PDP 11 DMS 
C 

return 

End 

SIBFTC COLCT 

subroutine COLCT (X»RN) 

COMMON /X7/ SUMA(2095) 

NbsRM 

IF((NoGToO) oANDo (K,.LE.20)) go TO 12 

CALL FPROR (11 90) 

return 

12 SUMA(N9l)=SUMA(N9l )+X 

SUMA ( M 9 2 ) =SUMA ( N ,2 ) +X-^-X 
SUMA(M!.3)=SuMA (N,3 )+l. 

SUMA ( N 9 4 ) -AMIN 1 ( SUMA { N ;• 4 ) ? X ) 

SUMA ( M 9 5 > = AMiAX 1 ( SUMA { N 9 5 ) 9 X ) 

RETUIsM.! 

END 

SIBFTC HISTO 

SUBROUTINE +ISTO (X 9 IC) 

COMMON /Xll/ PARAM(1599) 9FnL"C(159iO) 

IF(IC,lL-. 15) Go TO 2 
CALL ERPOP (III 9 O) 

return 

2 IF(XoGE.pARAM( IC 9 I ) ) GO TO :■ 

J = 1 

GO TO 6 

3 DO 5 1=299 

'IF(X oOEo PAPAM( I C,I)) GO TO 5 
J = I • 

■GO TO 6 ■ . ' 1 

C 

CLASS intervals LEFT CLOSED AND RIGHT OPENIN INCREASING 
C^f*4f*** order AS many as 9 CAN BE GIVENoTOTAL 10 CLASSES 


1 
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5 CONTINUE 
J = 10 

6 FpEQ{ TC<,J) = FnEC'( IC,J)+] oO 
RETURW 

END 

a- IE FTC TMST 

SuapOUTlNE Tl'.ST (X.K) 

COMNON /XX/ ERRs-T'CLCK s TCDE T . EAL K, CUMP ■.> TOT F 
C O iM 0 N / X 8 / 5 S U f' /, ( 2 0 , 5 ) 

I F ( ( I'; . G T ,, 0 ) .A iJ D V ( N . L L- . X 0 1 ) CD T 0 ? 0 

CALL EPPOR (12;.0) 

RETURN 

20 TT = CCLC}<‘-SSUl'iA ( N 5 1 ) 

SSUMA(Ns 1 )=bCLCK 

S-S'UMA CL,2) = psiir:.' (15,2) +'X * TT 

5 5 U.N A ( IM » 3 ) = L S U f-l A ( N » ?. ! + X -'x ^ ‘ T T 
bSUfiM ( M K A ) =AMl Ml ( SSUf/iA ( n . A ) i^X ) 

FSUmMN ) =A2;AX1 (SSLIMAIN oS ) ?X ) 

RETU^<^' 

END 

IIBFTC STAT 

GUIIPOUTINC &TAT (IC 9 JC) 

Common /x7/ suma{20s.l) 

COMMON /X3/ SSUMA(20>5) 

common /Xll/ PARAM (15,-.’l ‘.FREi^dSg 1C ) 

COMMON /X20/ NR>IPNCHdPPI'iT,IOPT 
C 

jC IS 0 FOP COLCT STAT ANL. 1 FOR TidT STATISTICS 
C 

IF{jC.,£Qol) GO TO 12 
I F ( SUM A ( I C 9 3 ) r. L E u t, ) 60 T 0 1 7 
AVLr? = SUi'‘lA ( I C 9 .! > / SUMA ( I C ) 

V AR = SUM A ( I C d ) / SUMA ( I C 9 3 ) - AV L R-»f * 2 

SMALL=SUMA { ICs A) 

t;iIG=SIJMA(IC93) 

GO T(,' T? 

17 AVEn = 0.- 
VAP=Oo 
BlG=Un 
SmALI. = Oc 
GO To 13 

12 AVER=SSUMA( IC92)/SSUMA( IC,1) 

VAR = SSUMA( ICd > /SSUMA { I C 9 1 l-AVER-iHJ- ^ 

small=ssumauic ,A) 

■ ■b.IG = SSUMA( ICsS") ‘ 

13 IF-( IPRNT' d'Qo 1) PRINT 1 1 9 AVER , VAR s B I G 9 SMALL 

11 , format (/20X9'i^MEAN --Sd E 1 2 o 5 9 5X 9*VAR I ANCE :=^^,AlZo5/ 

' 1 20X9'5^MAXIMUM =*9E12c,5 9 5 X 9 ^-MINIMUM =-:^ 9 E12 » 5 / ) 

IF( lOpToEQol ) WRITE(1,16) AVER , VAR 9 L I G 9 SMALL 
16 FORMAT! 8E12o5) 


I ■ Hi p ■! Ipi p ?|pi 



IFdPRMT a'Q. 0) r'ETURi'; 

PRINT 15 

15 Format ( //ZSX, :. DI STRloUTIOf.'-;:-// ) 

PRINT 145{Ff-'E::-( iCsJ) ,J=ldo) 

u Format { 2H /doiFiuddH /)/) 
return 

EI'ID 

TlLiFTC PAPOPT 

subroutine PAPOPT 

COHMON /Xi/ FIPOTsd'JNS-, IFL G 

coi'iMOn /Kill/ PBus; i:i' ) sVIor'-: ( no ) » tjou( i?.j ) dViips ( ipo ) > 

1 WOPS(TJU), 

Equivalence i pous . par ) 
common /XL 3/ I PC 

D I MENS I ON PAP ( 10 560 ) 5P0 ( A) 5 JXF; ( 4 ) 

DATA JXR/159525»41/ 

MUN=1 

rewind nun 
DO 13 I-lsNRUNG 
DO 13 J=l*12 
jSa ( J-1 1-id'+l 
v)F~ Jb + 3 

12 READ(NUN*14) (PAR( I 5l<) 5;.. = JS5jl- ) 

14 Format ( 8E12.. 5) 

JS=JF+1 

JF=JF+8 

13 P,EAD(mUN 9]4) (PARC I 9lO 5 K=JN'jJF) 

print 60 

60 Format ( ///iox»*1oTransit tiimE LoChanm utl i«PR0c utl 
1 ^'r 4 oOTCHCL UTL-^V) 

QO 61 I^lsNRUMS 
DO 62 J-I54 
JK,-JXr? ( J 1 

62 PS( J)=PAR( dJK) 

WrITLCEsi^E) (PS(J) sJ = 1?4) 

IFCIPC d-;Oa 1) PRINT 63 S (p:,.( J ) 5 J = 1 r 4) 

63 F0RMAT( 10X54(El2<.4n.Xl ) 

61 continue 

32 Format (SFiUc, 3) 

print 15 

15 F 0 P A T ( 1 H 1 5 3 0 X 5 id 0 D 5 T A T I b T I C N. / 3 5 X s T R A F' 6 1 T T I E -!<- 5 4 0 X 5 
1 ^WAITING TIME-;d) 

print 16 , - • , , 

46 . Format 1 2 ( ] 4X 5 ^i-MFAM^<- 5 1 ox ? i am* , ox s^rnaxi m* 5 1 ox , *:-'i I n I m* 5 

.1 )/) 

Do 17 1=1»NRUNS 

17 print I 85I 5 (PAR( I » J) , J^.1,,8;) 

•18 FpRMAT( 5Xd354El4o55 5X94£l4o5i) 

'PRINT 19 '■ ■ ' 

19 format ( IHl 5 30X 5-^CHANMEL STAT I ST I C5*//35X s^UT I LI ZAT ION* 5 4 OX 
■' . 1 *L0AD SHARING-*-/) 
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PPIMT 16 

DO -20 1 “1 i.!''RUf',S 

20 print 10»I c, (PAR( IsJ) ,J = Qa6) 

PPTMT 21 

21 Format (///30x?^h:hamnll ouEU!i-K-»ACXs-;."OM overflow ^/) 

PRINT 16 

DO 22 l=lsMRUNS 

22 PRINT IBi. I s IpAR( I ■■ J) » J=17..?A ) 

PRINT 23 

2 3 FORMAT ( IMli. 3UX vi: PROCEOOv''- '.TATI STIC ..-if-Z/O DX , -Ji-UTL I Z AT I ON-::- s 
1 A (• I X L 0 A D S H /'■. P I N G / S 
PRINT 1.6 
DO 24 1=1qNP'JW':. 

24 print 18iiI 5 (PAR( NJ) ,J=:?T,32) 

PRINT 23 

2.5 FrRM,/AT(/// 3 3X!.-!:l'AITINi' FQi? F I LE-::- Ci ''X » -!'■ W,\ I T I NG FOP,*!. 

1 ^iBUFFFi.-'*/ ) 

PRINT 10 

DO 20 I = 1.N[UJN.5 

26 PRINT lSi.1 5 {PAR( I oJ) .0=33.40) 

PRINT 27 

27 FnPMATdHl //30X s i:OUTpUT CHANnCL OTAT I ST t CS*Z /0 5X f 
1 *UTILIZAT J0N*>45X i^iLOAD SMAR I NG* //) 

PRINT 16 

DO 20 I = 1jNRIJNS 

25 PRINT 18. r » (PAR( I ,0) ,0 = 43 .48) 

PRINT 29 

29 r'ORMATdHI //3X » -id d ACS 2o SCLCK SubALK 4«,SUM SoQDM.AX*!. 

1 ^SoODMIN 7.QLMAX tUQLMlN ■»//) 

DO 3u I=1.MRUNS 

30 PRINT 18.1 , (PAR( dJ) »J=4'd36) 

RFTURN 

END 

SIBFTC OUTPUT 

sudroutinf output 

CO'^MON /XX/ ERR 9 5CLCK » TCUS I s DA LK j CUOTP , TOTF 

common /y 3 / XM A » XMP V XMP, » XNiS •.■ XiNf » RPEP . N 9 TCUSTP 

COMMON /X14/ ChCL-^K(.50) ,CHIL;LU(D0) . CHl .C.AD ( 50 ) ,QL(':iO) , 

1 08(50) 

CO-' MOM /X15/ QLMAXi.OL:'llM 90 d':.0,10> . N I F . KCI-h MQMAX . Afl I F , NCM 
Cr.MMOM /XI6/ OTCHCL ( 15 ) .OTCIIUT { IS ) jQTCi-IL )( 15) sNOCld 
1 SmAXsSMIN 

COMMON /X20/ NR- iPNCHdPF'NT.d'PT 

. common;- /XD l/ PbUS( 1.10 ) ,W0KI<( IGO ) .TJOS { 130 ) !.WIPC ( 130 ) . 

1 woP^hs-Ci) 

n; IFdPRNT -EOo 1) PRINT I 0 

10' FORMAT ( 26X 9*J06 ST AT I ST ICS*/24X > 1 8 ( 1H~ ) //20X ddRANSi T* 9 
1 -wTIME of JOBS*//)' 

CALL STAT (290) 

IFdPRNT oEOo 1) PRINT 11 

11 FORMAT! 20X,-:'-VJA I TING TIME OF JODS FC* A PROCESSOR*/) 

CALL STAT (1,0) 

QBT!=0 
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TAC.S = n,. 

OBMIM = 1uOE .'d 
Do 15 I = 

1 1- ( ni MmXc |_T 01 ,( 1 )) .'■.bMAX = Oo ( I ) 

IF (nl,:Mlf.! oGT, Ob { I ! > Ol':'! = ( I 

OBT = OP.T+OU { I ) 

15 T AC5j = TACS+CHL' ' AD ( I ) 

C 

C * * -:f c i I A I'' u D. L U T I L I Z A T 1 'Z. f'! 5 T m T : D T I C 6 

C 


IF( iPr.'TnF.rioi) PPIF’’'' DDsi-'Di 

23 Format ( /Z5X ^ ^-chaobfi Drii,T_ no.') rjT.ArisTics*/’23X!i34( iH-n. 
1 / 2 0 X n' F 0 R f i O' ... T) F C H.'- 1 ' i. D =•' '■ ■ f ' ■ / f 1 
DO 2 2 I-imOH 
OLT = C., 

UTL= ( CHCLC'I-,U I ) -LI 1 1 Dl.t ( I !)-:'■ 1 L-U HLL P ( I ! 

C.ALL COLCT '(UTL<.3) 

CALL HiSTO (ijTLjI) 

PE('.L = ] n0uL-V‘'LHLOAL'( I 1 /TACb 
CALL COLCT (PERL'^^+' 

CALL HISTO (PF-RLi'A) 

1 F ( QbT CT 0 u ) QLT=Ot'. ( I ) ^nOO :. /OBT 
CALL COLCT (QLT»6) 

CALL HISTO (OLT»6) 

22 CONTImUE 

CALL .STAT(3,0) 

IF( IPf;MToCO» 1) PRIN'^ 2^nTAi,.j 

24 Format ( 25X n(-CHANNCL load snap, IMG*/?.3X 924 ( 1H-) /25X> 

1 if'FOR MOo OF access s^n'ElZu?./) 

CALL CTAT(490) 

IFdPpNT ..NL„ 1) 00 TO 30 
PPIOT 12,01 MAX sGLMl'l^ 

12 Format ( /25X 9-)!- 1 noex fili orATnnics x-/23X!.30 ( ih-) s 

1 //20X 9^f-QUl£U£ STATIST I CS*/'.U;a 9 AbSOLUTc MAXIMUM ='!nEl2o;; 

2 /3UX 94;-AbSOLUTL MINIMUM =*9El2ni.//) 

30 CALL STAT (5,1) 

IFdPRNT.EO. 1) print 25 , OB; ' 'IX , M I F 

25 FO^.MAT ( /25X 9 -li-CVERFEOW IN 0 '-'.aTR I X '■■ / 2 3X , 3 1 ( I H- ) / 20X , 

1 ^absolute maximum =--dE12o3/20X9 

2 -xAE^SOLUTE mi mi, mum =*9F12-,3//) 

CALL STmT(69D) 

C 

PROCFSSOR UTILIZATION 
C 


Do 16 I=l9N 

UTL=0„ 

IF<PbuS(I) uGT^ 0^) UTL = W(;RK ( I) -JdOOp/PBUS ( I ) 
CALL COLCT UUTL ,7 ) 

CAlL -HISTO UUTL ,7) 

,RERJ = TJOBd )*100o0/T0TF 

CALL HISTO (PERJwS): . , , ' 
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16 

17 


IB 


19 

2 0 


C 


1 


1 


], 


CALL COLCT (PERJsB) 
WIPLt T )=V..'Ipo{ I ) /TJOL( I ) 


CALL COLCT (vaPOCI ) ,9) 
CALL HISTO {VJIP6( I ) sP ) 
WOPS( 1 ) =WOPS { I ) /TJOB { I ) 
CALL COLCT ( WOPS ( t ) aiO ) 
C/LL HIOTO {'.iT PC ( I ) s I'" 1 
CONT I NLL 

IFIIPPmT ..Co, 1) PPlfT 




F 0 R M /a ( / 2 X T ■ U T I L I Z /A T I n M r ■ F c - n C E S C o P S - / 2 0 X 2 B ( 1 H - ) / / ) 
CALL F,TAT(7a.i) 

IPdiaMTaad) PPIMT icdap 

FOPMr.T ( /25XT-:<L0AD JLwvp IpG OF PPOCF:. LORb*/ 2?X , 28 ( iH-) // 
20 X 9 -“ 100, PERCENT CO( ' EOPCMDS TO dLlLdia!- FILEC-F/) 

CALL FT AT (id 01 


IF( IPrNT JEOol 1 PPsINT !'■ 

FORMAT ( /ZLX s-fWAI TING TTkC L) I. bTP I O.UT KiM C'= PPOCESCORS -!- 9 


/23X 9 50 ( IH- 1 // ) 

CALL STATlRsQ) 

]F( IPRNToEQ.1) PRINT 20 

F0RMAT(/25X9-):VMITING TIME FOR OUTPUT CHANMEL«-/23X 9 35 (1 H- 
)//) 

CALL AT At (10,0) 


ciHHHHHi- output channel GTATISTiCo 

C 

SUM=Uo 

DO 31 I*=1,N0CH 

31 S(JM~SUM+OTCHlD ( I ) 

DO 3 2 r^UNOCH 

UTL= (OTC|-iCL( I )-' 0 TCHWT ( I 1 ddOO ■. /OTCriCL ( I 1 
CALL COLCT (UTL.ll) 

CALL HISTO (UTLdl) 

PEr;L = OTCHLD( D-iflOUc/SUN 
CALL COLCT (PERL»12) 

CALL HIST0(PERL,12> 

32 continue 

IFdPPNT oL-Q. 1) PRINT 33 

33 FoPMAT( /25XaiOUPUT CHANNEL UT I L I ZAT ION*/ 23X 9 3 0 ( IH- ) / / ) 
CALL ST AT (II 9 O) 

IFdPRMT -EQo 1) PfdNT 349 lUM 

F 0 RMAT( 25X9*0UTPUT CHANNEL LOAD SHAR I NG^d 23X , 30 (IH- ) / / 
1 25X9*]00 PERCENT CORRESPONDS TO -Ji , El 2 n 5/ / ) 

CALL STAT' (IP-sO) 

IFdOpToEbul) WRITE(1,27) TACS , SCLC'X 9 hALK 9 SUM sQDMAX 9 
1 ’QBMIN 9 QLMAX 9 QLMIN 
27 F 0 RMAT( 3E12«5 1 

print I 49 TACS 9 TCUST 9 SCLCK 

14 Format ( 20X 9*total file access =*»ei2o4/ 

1 23X9*F0R NOo of customers =*9£12o4/ 

2 26X,*oVER a period OF TIME =*9E12o4/) 

return 

END 
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